Planet Linux Australia

Syndicate content
Planet Linux Australia - http://planet.linux.org.au
Updated: 21 min 59 sec ago

Matthew Oliver: Keystone Federated Swift – Multi-region cluster, multiple federation, access same account

Fri, 2018-08-31 20:05

Welcome to the final post in the series, it has been a long time coming. If required/requested I’m happy to delve into any of these topics deeper, but I’ll attempt to explain the situation, the best approach to take and how I got a POC working, which I am calling the brittle method. It definitely isn’t the best approach but as it was solely done on the Swift side and as I am a OpenStack Swift dev it was the quickest and easiest for me when preparing for the presentation.

To first understand how we can build a federated environment where we have access to our account no matter where we go, we need to learn about how keystone authentication works from a Swift perspective. Then we can look at how we can solve the problem.

Swift’s Keystoneauth middleware

As mentioned in earlier posts, there isn’t any magic in the way Swift authentication works. Swift is an end-to-end storage solution and so authentication is handled via authentication middlewares. further a single Swift cluster can talk to multiple auth backends, which is where the `reseller_prefix` comes into play. This was the first approach I blogged about in these series.

 

There is nothing magical about how authentication works, keystoneauth has it’s own idiosyncrasies, but in general it simply makes a decision whether this request should be allowed. It makes writing your own simple, and maybe an easily way around the problem. Ie. write an auth middleware to auth directly to your existing company LDAP server or authentication system.

 

To setup keystone authentication, you use keystones authtoken middleware and directly afterwards in the pipeline place the Swift keystone middleware, configuring each of them in the proxy configuration:

pipeline = ... authtoken keystoneauth ... proxy-server The authtoken middleware

Generally every request to Swift will include a token, unless it’s using tempurl, container-sync or to a container that has global read enabled but you get the point.

As the swift-proxy is a python wsgi app the request first hits the first middleware in the pipeline (left most) and works it’s way to the right. When it hits the authtoken middleware the token in the request will be sent to keystone to be authenticated.

The resulting metadata, ie the user, storage_url, groups, roles etc, and dumped into the request environment and then passed to the next middleware. The keystoneauth middleware.

The keystoneauth middleware

The keystoneauth middleware checks the request environment for the metadata dumped by the authtoken middleware and makes a decision based on that. Things like:

  • If the token was one for one of the reseller_admin roles, then they have access.
  • If the user isn’t a swift user of the account/project the request is for, is there an ACL that will allow it.
  • If the user has a role that identifies them as a swift user/operator of this Swift account then great.

 

When checking to see if the user has access to the given account (Swift account) it needs to know what account the request is for. This is easily determined as it’s defined by the path of the URL your hitting. The URL you send to the Swift proxy is what we call the storage url. And is in the form of:

http(s)://<url of proxy or proxy vip>/v1/<account>/<container>/<object>

The container and object elements are optional as it depends on what your trying to do in Swift. When the keystoneauth middleware is authenticating it’ll check that the project_id (or tenant_id) metadata dumped by authtoken, when this is concatenated with the reseller_prefix, matches the account in the given storage_url. For example let’s say the following metadata was dumped by authtoken:

{ "X_PROJECT_ID": 'abcdefg12345678', "X_ROLES": "swiftoperator", ... }

And the reseller_prefix for keystone auth was AUTH_ and we make any member of the swiftoperator role (in keystone) a swift operator (a swift user on the account). Then keystoneauth would allow access if the account in the storage URL matched AUTH_abcdefg12345678.

 

When you authenticate to keystone the object storage endpoint will point not only to the Swift endpoint (the swift proxy or swift proxy load balancer), but it will also include your account. Based on your project_id. More on this soon.

 

Does that make sense? So simply put to use keystoneauth in a multi federated environment, we just need to make sure no matter which keystone we end up using and asking for the swift endpoint always returns the same Swift account name.

And there lies our problem, the keystone object storage endpoint and the metadata authtoken dumps uses the project_id/tenant_id. This isn’t something that is synced or can be passed via federation metadata.

NOTE: This also means that you’d need to use the same reseller_prefix on all keystones in every federated environment. Otherwise the accounts wont match.

 

Keystone Endpoint and Federation Side

When you add an object storage endpoint in keystone, for swift, the url looks something like:

http://swiftproxy:8080/v1/AUTH_$(tenant_id)s

 

Notice the $(tenant_id)s at the end? This is a placeholder that keystone internally will replace with the tenant_id of the project you authenticated as. $(project_id)s can also be used and maps to the same thing. And this is our problem.

When setting up federation between keystones (assuming keystone 2 keystone federation) you generate a mapping. This mapping can include the project name, but not the project_id. Theses ids are auto-generated, not deterministic by name, so creating the same project on different federated keystone servers will have different project_id‘s. When a keystone service provider (SP) federates with a keystone identity provider (IdP) the mapping they share shows how the provider should map federated users locally. This includes creating a shadow project if a project doesn’t already exist for the federated user to be part of.

Because there is no way to sync project_id’s in the mapping the SP will create the project which will have a unique project_id. Meaning when the federated user has authenticated their Swift storage endpoint from keystone will be different, in essence as far as Swift is concerned they will have access but to a completely different Swift account. Let’s use an example, let’s say there is a project on the IdP called ProjectA.

project_name project_id IdP ProjectA 75294565521b4d4e8dc7ce77a25fa14b SP ProjectA cb0d5805d72a4f2a89ff260b15629799

Here we have a ProjectA on both IdP and SP. The one on the SP would be considered a shadow project to map the federated user too. However the project_id’s are both different, because they are uniquely  generated when the project is created on each keystone environment. Taking the Object Storage endpoint in keystone as our example before we get:

 

   Object Storage Endpoint IdP http://swiftproxy:8080/v1/AUTH_75294565521b4d4e8dc7ce77a25fa14b SP http://swiftproxy:8080/v1/AUTH_cb0d5805d72a4f2a89ff260b15629799

So when talking to Swift you’ll be accessing different accounts, AUTH_75294565521b4d4e8dc7ce77a25fa14b and AUTH_cb0d5805d72a4f2a89ff260b15629799 respectively. This means objects you write in one federated environment will be placed in a completely different account so you wont be able access them from elsewhere.

 

Interesting ways to approach the problem

Like I stated earlier the solution would simply be to always be able to return the same storage URL no matter which federated environment you authenticate to. But how?

  1. Make sure the same project_id/tenant_id is used for _every_ project with the same name, or at least the same name in the domains that federation mapping maps too. This means direct DB hacking, so not a good solution, we should solve this in code, not make OPs go hack databases.
  2. Have a unique id for projects/tenants that can be synced in federation mapping, also make this available in the keystone endpoint template mapping, so there is a consistent Swift account to use. Hey we already have project_id which meets all the criteria except mapping, so that would be easiest and best.
  3. Use something that _can_ be synced in a federation mapping. Like domain and project name. Except these don’t map to endpoint template mappings. But with a bit of hacking that should be fine.

Of the above approaches, 2 would be the best. 3 is good except if you pick something mutable like the project name, if you ever change it, you’d now authenticate to a completely different swift account. Meaning you’d have just lost access to all your old objects! And you may find yourself with grumpy Swift Ops who now need to do a potentially large data migration or you’d be forced to never change your project name.

Option 2 being unique, though it doesn’t look like a very memorable name if your using the project id, wont change. Maybe you could offer people a more memorable immutable project property to use. But to keep the change simple being able simply sync the project_id should get us everything we need.

 

When I was playing with this, it was for a presentation so had a time limit, a very strict one, so being a Swift developer and knowing the Swift code base I hacked together a varient on option 3 that didn’t involve hacking keystone at all. Why, because I needed a POC and didn’t want to spend most my time figuring out the inner workings of Keystone, when I could just do a few hacks to have a complete Swift only version. And it worked. Though I wouldn’t recommend it. Option 3 is very brittle.

 

The brittle method – Swift only side – Option 3b

Because I didn’t have time to simply hack keystone, I took a different approach. The basic idea was to let authtoken authenticate and then finish building the storage URL on the swift side using the meta-data authtoken dumps into wsgi request env. Thereby modifying the way keystoneauth authenticates slightly.

Step 1 – Give the keystoneauth middleware the ability to complete the storage url

By default we assume the incoming request will point to a complete account, meaning the object storage endpoint in keystone will end with something like:

'<uri>/v1/AUTH_%(tenant_id)s'

So let’s enhance keystoneauth to have the ability to if given only the reseller_prefix to complete the account. So I added a use_dynamic_reseller option.

If you enable use_dynamic_reseller then the keystoneauth middleware will pull the project_id from authtoken‘s meta-data dumped in the wsgi environment. This allows a simplified keystone endpoint in the form:

'<uri>/v1/AUTH_'

This shortcut makes configuration easier, but can only be reliably used when on your own account and providing a token. API elements like tempurl  and public containers need the full account in the path.

This still used project_id so doesn’t solve our problem, but meant I could get rid of the $(tenant_id)s from the endpoints. Here is the commit in my github fork.

Step 2 – Extend the dynamic reseller to include completing storage url with names

Next, we extend the keystoneauth middleware a little bit more. Give it another option, use_dynamic_reseller_name, to complete the account with either project_name or domain_name and project_name but only if your using keystone authentication version 3.

If you are, and want to have an account based of the name of the project, then you can enable use_dynamic_reseller_name in conjuction with use_dynamic_reseller to do so. The form used for the account would be:

<reseller_prefix><project_domain_name>_<project_name>

So using our example previously with a reseller_preix of AUTH_, a project_domain_name of Domain and our project name of ProjectA, this would generate an account:

AUTH_Domain_ProjectA

This patch is also in my github fork.

Does this work, yes! But as I’ve already mentioned in the last section, this is _very_ brittle. But this also makes it confusing to know when you need to provide only the reseller_prefix or your full account name. It would be so much easier to just extend keystone to sync and create shadow projects with the same project_id. Then everything would just work without hacking.

Clinton Roy: Moving to Melbourne

Fri, 2018-08-31 18:00

Now that the paperwork has finally all been dealt with, I can announce that I’ll be moving down to Melbourne to take up a position with the Australian Synchrotron, basically a super duper x-ray machine used for research of all types. My official position is a >in< Senior Scientific Software Engineer <out> I’ll be moving down to Melbourne shortly, staying with friends (you remember that offer you made, months ago?) until I find a rental near Monash Uni, Clayton.

I will be leaving behind Humbug, the computer group that basically opened up my entire career, and The Edge, SLQ, my home-away-from-home study. I do hope to be able to find replacements for these down south.

I’m looking at having a small farewell nearby soon.

A shout out to Netbox Blue for supplying all my packing boxes. Allll of them.

OpenSTEM: This Week in Australian History

Fri, 2018-08-31 16:06
The end of August and beginning of September is traditionally linked to the beginning of Spring in Australia, although the change in seasons is experienced in different ways in different parts of the country and was marked in locally appropriate ways by Aboriginal people. As a uniquely Australian celebration of Spring, National Wattle Day, celebrated […]

Pia Waugh: Mā te wā, Aotearoa

Fri, 2018-08-31 12:01

Today I have some good news and sad news. The good news is that I’ve been offered a unique chance to drive “Digital Government” Policy and Innovation for all of government, an agenda including open government, digital transformation, technology, open and shared data, information policy, gov as a platform, public innovation, service innovation and policy innovation. For those who know me, these are a few of my favourite things

The sad news, for some folk anyway, is I need to leave New Zealand Aotearoa to do it.

Over the past 18 months (has it only been that long!) I have been helping create a revolutionary new way of doing government. We have established a uniquely cross-agency funded and governed all-of-government function, a “Service Innovation Lab”, for collaborating on the design and development of better public services for New Zealand. By taking a “life journey” approach, government agencies have a reason to work together to improve the full experience of people rather than the usual (and natural) focus on a single product, service or portfolio. The Service Innovation Lab has a unique value in providing an independent place and way to explore design-led and evidence-based approaches to service innovation, in collaboration with service providers across public, private and non-profit sectors. You can see everything we’ve done over the past year here  and from the first 10 week experiment here. I have particularly enjoyed working with and exploring the role of the Citizen Advice Bureau in New Zealand as a critical and trusted service delivery organisation in New Zealand. I’m also particularly proud of both our work in exploring optimistic futures as a way to explore what is possible, rather than just iterate away from pain, and our exploration of better rules for government including legislation as code. The next stage for the Lab is very exciting! As you can see in the 2017-18 Final Report, there is an ambitious work programme to accelerate the delivery of more integrated and more proactive services, and the team is growing with new positions opening up for recruitment in the coming weeks!

Please see the New Zealand blog (which includes my news) here

Professionally, I get most excited about system transformation. Everything we do in the Lab is focused on systemic change, and it is doing a great job at having an impact on the NZ (and global) system around it, especially for its size. But a lot more needs to be done to scale both innovation and transformation. Personally, I have a vision for a better world where all people have what they need to thrive, and I feel a constant sense of urgency in transitioning our public institutions into the 21st century, from an industrial age to the information age, so they can more effectively support society as the speed of change and complexity exponentially grows. This is going to take a rethink of how the entire system functions, especially at the policy and legislative levels.

With this in mind, I have been offered an extraordinary opportunity to explore and demonstrate systemic transformation of government. The New South Wales Department of Finance, Services and Innovation (NSW DFSI) has offered me the role of Executive Director for Digital Government, a role responsible for the all-of-government policy and innovation for ICT, digital, open, information, data, emerging tech and transformation, including a service innovation lab (DNA). This is a huge opportunity to drive systemic transformation as part of a visionary senior leadership team with Martin Hoffman (DFSI Secretary) and Greg Wells (GCIDO). I am excited to be joining NSW DFSI, and the many talented people working in the department, to make a real difference for the people of NSW. I hope our work and example will help raise the bar internationally for the digital transformation of governments for the benefit of the communities we serve.

Please see the NSW Government media release here.

One of the valuable lessons from New Zealand that I will be taking forward in this work has been in how public services can (and should) engage constructively and respectfully with Indigenous communities, not just because they are part of society or because it is the right thing to do, but to integrate important principles and context into the work of serving society. Our First Australians are the oldest cluster of cultures in the world, and we have a lot to learn from them in how we live and work today.

I want to briefly thank the Service Innovation team, each of whom is utterly brilliant and inspiring, as well as the wonderful Darryl Carpenter and Karl McDiarmid for taking that first leap into the unknown to hire me and see what we could do. I think we did well I’m delighted that Nadia Webster will be taking over leading the Lab work and has an extraordinary team to take it forward. I look forward to collaborating between New Zealand and New South Wales, and a race to the top for more inclusive, human centred, digitally enabled and values drive public services.

My last day at the NZ Government Service Innovation Lab is the 14th September and I start at NSW DFSI on the 24th September. We’ll be doing some last celebratory drinks on the evening of the 13th September so hold the date for those in Wellington. For those in Sydney, I can’t wait to get started and will see you soon!

David Rowe: Simple Keras “Hello World” Example – Mean Removal

Thu, 2018-08-30 14:04

Inspired by the Wavenet work with Codec 2 I’m dipping my toe into the word of Deep Learning (DL) using Keras. I’ve read Deep Learning with Python (quite an enjoyable read) and set up a Linux box with a GTX graphics card that is making my teenage sons weep with jealousy.

So after a couple of days of messing about here is my first “hello world” Keras example: mean_removal.py. It might be helpful for other Keras noobs. Assuming you have all the packages installed, it runs with either Python 2:

$ python mean_removal.py

Or Python 3:

$ python3 mean_removal.py

It removes the mean from vectors, using just a single layer regression model. The script runs OK on a regular PC without a chunky graphics card.

So I start by generating vectors from random numbers with a zero mean. I then add a random offset to each sample in the vector. Here are 5 vectors with random offsets added to them:

The Keras script is then trained to estimate and remove the offsets, so the output vectors look like:

Estimating the offset is the same as finding the “mean” of the vector. Yes I know we can do that with a “mean” function, but where’s the fun in that!

Here are some other plots that show the training and validation measures, and error metrics at the output:



The last two plots show pretty much all of the offset is removed and it restores the original (non offset) vectors with just a tiny bit of noise. I had to wind back the learning rate to get it to converge without “NAN” losses, possibly as I’m using fairly big input numbers. I’m familiar with the idea of learning rate from NLMS adaptive filters, such as those used for my work in echo cancellation.

Deep Learning for Codec 2

My initial ambitions for DL are somewhat more modest than the sample-by-sample synthesis used in the Wavenet work. I have some problems with Vector Quantisation (VQ) in the low rate Codec 2 modes. The VQ is used to compactly describe the speech spectrum, which carries the intelligibility of the signal.

The VQ gets upset with different microphones, speakers, or minor spectral shaping like gentle high pass/low pass filtering. This shaping often causes a poor vector to be chosen, which results in crappy speech quality. The current VQ error measure can’t tell the difference between spectral features that matter and those that don’t.

So I’d like to try DL to address those issues, and train a system to say “look, this speech and this speech are actually the same. Yes I know one of them has a 3dB/octave low pass filter, please ignore that”.

As emphasised in the text book above, some feature extraction can help with DL problems. For my first pass I’ll be working on parameters extracted by the Codec 2 model (like a compact version of the speech spectrum) rather than speech samples like Wavenet. This will reduce my CPU load significantly, at the expense of speech quality, which will be limited by the unquantised Codec 2 model. But that’s OK as a first step. A notch or two up on Codec 2 at 700 bit/s would be very useful, especially if it can run on a CPU from the first two decades of the 21st century.

Mean Removal on Speech Vectors

So to get started with Keras I chose mean removal. The mean level or constant offset is like the volume or energy in a speech signal, its the simplest form of spectral shaping I could imagine. I trained and tested it with vectors of random numbers, using numbers in the range of the speech spectral samples that Codec 2 plays with.

It’s a bit like an equaliser, vectors with arbitrary spectral shaping go in, “flat” unshaped vectors come out. They can then be sent to a Vector Quantiser. There are probably smarter ways to do this, but I need to start somewhere.

So as a next step I tried offset removal with vectors that represent the spectrum of 40ms speech frame:


This is pretty cool – the network was trained on random numbers but works well with real speech frames. You can also see the spectral slope I mentioned above, the speech energy gradually falls off at high frequencies. This doesn’t affect the intelligibility of the speech but tends to upset traditional Vector Quantisers. Especially mine.

Now that I have something super-basic working, the next step is to train and test networks to deal with some non-trivial spectral shaping.

Reading Further

Deep Learning with Python
WaveNet and Codec 2
Codec 2 700C, the current Codec 2 700 bit/s mode. With better VQ we can improve on this.
Codec 2 at 450 bit/s, some fine work from Thomas and Stefan, that uses a form of machine learning to synthesise 16 kHz speech from 8 kHz inputs.
FreeDV 700D, the recently released FreeDV mode that uses Codec 2 700C. A FreeDV Mode also includes a modem, FEC, protocol.
RNNoise: Learning Noise Suppression, Jean-Marc’s DL network for noise reduction. Thanks Jean-Marc for the brainstorming emails!

Michael Still: What’s missing from the ONAP community — an open design process

Thu, 2018-08-30 14:00

I’ve been thinking a fair bit about ONAP and its future releases recently. This is in the context of trying to implement a system for a client which is based on ONAP. Its really hard though, because its hard to determine how various components of ONAP are intended to work, or interoperate.

It took me a while, but I’ve realised what’s missing here…

OpenStack has an open design process. If you want to add a new feature to Nova for example, the first step is you need to write down what the feature is intended to do, how it integrates with the rest of Nova, and how people might use it. The target audience for that document is both the Nova development team, but also people who operate OpenStack deployments.

ONAP has no equivalent that I can find. So for example, they say that in Casablanca they are going to implement a “AAI Enricher” to ease lookup of data from external systems in their inventory database, but I can’t find anywhere where they explain how the integration between arbitrary external systems and ONAP AAI will work.

I think ONAP would really benefit from a good hard look at their design processes and how approachable they are for people outside their development teams. The current use case proposal process (videos, conference talks, and powerpoint presentations) just isn’t great for people who are trying to figure out how to deploy their software.

Linux Users of Victoria (LUV) Announce: Software Freedom Day 2018 and LUV AGM

Wed, 2018-08-29 20:03
Start: Sep 15 2018 13:00 End: Sep 15 2018 17:00 Start: Sep 15 2018 13:00 End: Sep 15 2018 17:00 Location:  Electron Workshop, 31 Arden Street North Melbourne 3051 Link:  https://www.openstreetmap.org/node/2556615434

It's time once again to get excited about all the benefits that Free and Open Source Software have given us over the past year and get together to talk about how Freedom and Openness can improve our human rights, our privacy, our security and our communities. It's Software Freedom Day!

Linux Users of Victoria is a subcommittee of Linux Australia.

September 15, 2018 - 13:00

read more

Linux Users of Victoria (LUV) Announce: LUV September 2018 Main Meeting: New Developments in Supercomputing

Wed, 2018-08-29 20:03
Start: Sep 4 2018 18:30 End: Sep 4 2018 20:30 Start: Sep 4 2018 18:30 End: Sep 4 2018 20:30 Location:  Kathleen Syme Library, 251 Faraday Street Carlton VIC 3053 Link:  http://www.melbourne.vic.gov.au/community/hubs-bookable-spaces/kathleen-syme-lib...

PLEASE NOTE RETURN TO ORIGINAL START TIME

6:30 PM to 8:30 PM Tuesday, September 4, 2018
Training Room, Kathleen Syme Library, 251 Faraday Street Carlton VIC 3053

Speakers:

Many of us like to go for dinner nearby after the meeting, typically at Trotters Bistro in Lygon St.  Please let us know if you'd like to join us!

Linux Users of Victoria is a subcommittee of Linux Australia.

September 4, 2018 - 18:30

read more

David Rowe: Band Pass Filter and Power Amplifier for Simple HF Data

Wed, 2018-08-29 12:05

Is it possible to move data over HF radio using very simple, low cost hardware and clever SDR software? In the last few posts (here and here) I’ve been constructing and testing building blocks for a simple HF data terminal. This post describes a few more, a 3-8 MHz Band Pass Filter (BPF) and 1W Power Amplifier (PA).

Band Pass Filter

The RTL-SDR samples at 28.8 MHz, capturing a broad chunk of spectrum. In direct mode we just sample the Q-channel, so any energy above 14.4 MHz will be aliased into our passband; e.g. both 21 and 7 MHz will appear as a 7 MHz sampled signal.

In the previous post we determined the ADC overloads at -30dBm, so we want to remove any strong signals above or near that level. One source of strong signals is broadcast band AM radio between 500 to 1600 kHz.

The use case is “100 mile” data links so I’d like the receiver to work on the 80M (3.5 MHz) as well as 40M (7.1 MHz) bands, which sets the BPF passband at 3 to 8 MHz. I hooked up my spec-an to a 40M antenna and could see AM broadcast signals peaking at -40dBm, so I set a BPF specification of > 20dB attenuation at 1.5 MHz to keep the sum of all those signals well away from the -30dBm limit. At the high frequency end I specified at > 30dB attenuation at 21 MHz, to reduce any energy aliased down to 7 MHz.

I designed a cascaded High Pass Low Pass/Filter using some tables from my ancient (but still excellent) copy of “RF Circuit Design”, by Chris Bowick. The Octave rtl_sdr script does the calculations for me. A spreadsheet would work well too.

I simulated the BPF using LTSpice, fixed a few bugs, and tweaked it for real world component values. Here is the circuit and frequency response on log and linear scales:



I soldered up the BPF Manhattan style using commercial axial 1uH inductors and ceramic capacitors, then tested it using the spec-an and tracking generator (note linear scale):

The table at the bottom shows the measured attenuation at some important frequencies. The attenuation is a bit low at 21 MHz, perhaps due to the finite Q of the real world inductors. Quite a good match to the LTSpice simulation and close enough for my experiments. The little step at around 10 MHz is a tracking generator artefact.

The next plot shows the effect of the BPF when my spec-an is connected to my 40M dipole (0 to 10MHz span). Yellow is the received signal without the filter, purple with the filter.

The big spike around 0 Hz is an artefact on the spec-an. The filter is doing a good job of nailing the AM broadcast band energy. You can see a peak around 7.4 MHz where the dipole is resonant. Actually this is a bit of a surprise to me, as I want it resonant around 7.2MHz, I better check that out! At 7.2-ish the insertion loss (difference between the purple and yellow) is a few dB as per the tracking generator plot above. It’s more like 6dB at 7.4 MHz (the dipole peak), not quite sure why. An insertion loss of 3dB at 7.2 MHz is OK for my application.

Power Amplifier

A few weeks ago I hooked the rpitx to my 40M dipole and managed to demodulate the 11mW signal a few km away (over an urban channel) using a mag loop and my FT-817. I decided to build a small 1W PA to make the system usable over “100 mile” HF channels. The actual power is not that critical, as we can trade power off against bit rate. For example if a given HF channel supports 100 bit/s at 1W, we then know we can get 1000 bit/s at 10W.

Even low bit rates can be very useful if you have no other communication. A text message or Tweet, allowing for some overhead, averages about 1000 bits. So at 1000 bit/s you can send 1 txt per second, 3600 an hour, or 86,000/day. That’s very useful communication if you are in a disaster situation and want to tell family you are alive. Or perhaps live in a remote area with no other communication. Of course HF channels come and go, so the actual throughput will be less than that.

I explored the junk box and found a partially constructed Beach 40. I isolated the driver and PA stage and poked it with my signal generator. Turns out it had a bit too much gain (the rpitx has plenty of drive) so I ended up with this simple PA circuit:



The only spurious output I can see is the 2nd harmonic is at -44 dBC, meeting ACMA specs:

The low pass filter at the output has a 3dB point at about 10 MHz which is a little high. It could be brought down a little to increase stop-band attenuation and reduce the 2nd harmonic further. I haven’t done anything about impedance matching the input, as it hits 1W (30dBm) output with 14dBm drive from the rpitx. The 1 inch square heatsink is quite warm after 10 minutes but I can still hold it. It’s not very efficient, 2.9W DC input power for 1W out, however 16dB power gain is quite good for a PA. Anyhoo, it’s a fine starting point for my experiments, we can optimise the PA later if necessary.

Next Steps

OK, so I have most of the building blocks I need for some over the air HF data experiments. There was a bit of engineering involved in building the BPF and PA, but the designs are very simple and can be constructed for a few $ or even from road kill (recycled) components. We now have a very low cost HF data radio, running high performance modems, connected to a Linux computer and Wifi.

Next I will put some software together to estimate data throughput, set the system up with real antennas, and gather some experimental results over real world HF channels.

Reading Further

Rpitx and 2FSK, first part in this series.
Testing a RTL-SDR with FSK on HF, second part in this series.
rtl_sdr.m script that calculates component values for the BPF.

Gary Pendergast: Forking is a Feature

Sun, 2018-08-26 16:04

There’s a new WordPress fork called ClassicPress that’s been making some waves recently, with various members of the Twitterati swinging between decrying it as an attempt to fracture the WordPress community, to it being an unnecessary over-reaction, to it being a death knell for WordPress.

Personally, I don’t think it’s any of the above.

Some years ago, Anil Dash wrote an article on this topic (which I totally ripped forked the name from), you should read it for some context.

Forking is a Feature

While Linus Torvalds is best known as the creator of Linux, it’s one of his more geeky creations, and the social implications of its design, that may well end up being his greatest legacy. Because Linus has, in just a few short years, changed the social dynamic around forking,

Anil Dash

With that context, I genuinely applaud ClassicPress for exercising their fundamental rights under the GPL. The WordPress Bill of Rights makes it quite clear that forking is not just allowed, it’s encouraged. You can and should fork WordPress if you choose to. This isn’t a flaw in the system, this is how it’s supposed to work.

Forks should aways be encouraged.

Forks are a fundamentally healthy aspect of Open Source software. A relatively recent example is the io.js fork of Node.js, which resulted in significant changes to how the Node.js project is governed and developed. WordPress has seen forks in the past, too: Lyceum was a fork that added multi-site support, before it existed in WordPress. WordPress MU was something of a sibling fork which also added multi-site support, and was ultimately merged back into WordPress.

There are examples of forks that went on to become independent projects: WordPress itself is a fork of cafelog/b2. X.org is a fork of XFree86. LibreOffice is a fork of OpenOffice. Blink is a fork of WebKit, which in turn is a fork of KHTML. MariaDB is a fork of MySQL. XBMC has been forked dozens of times. Joomla is a fork of Mambo. (Fun historical coincidence: I very nearly accepted a job offer from Miro, the company behind Mambo, just a couple of months before Joomla came into being!)

Maintaining a fork is hard, thankless work.

All of these independent forks have a common thread: they started with a group of people who were highly experienced in building the software they were forking (often comprising of core developers of the original software). That’s not to say that non-core developers can’t drive a fork, but it does seem to require fairly fundamental knowledge of the strengths and weaknesses of the software, in order to successfully fork it into an independent product.

From a practical perspective, I can tell you that maintaining a fork of WordPress would require an extraordinary amount of work. For example, WordPress.com effectively maintains a fork (which happens to almost exactly match the Core codebase) of WordPress. The task of maintaining this fork falls to a talented team of devops folks, who review and merge each patch.

Now, WordPress.com is really only an internal fork. To maintain a product fork of WordPress would require so much more effort. You’d need to maintain the web infrastructure to push out updates. As the fork diverges from WordPress Core, you would need to figure out how to maintain plugin and theme compatibility. You’d likely need to do your own bug and security fixes, on top of what’s merged from WordPress.

I’m not saying this to dissuade anyone from forking WordPress, rather, it’s important to go into this aware of the challenges that lay ahead. For anyone who uses a fork (whether it be a fork of WordPress, or any other software product), I’m sure the maintainer would appreciate a word of thanks for the work they’ve done to make it possible.

Dave Hall: AWS Parameter Store

Sun, 2018-08-26 02:03

Anyone with a moderate level of AWS experience will have learned that Amazon offers more than one way of doing something. Storing secrets is no exception. 

It is possible to spin up Hashicorp Vault on AWS using an official Amazon quick start guide. The down side of this approach is that you have to maintain it.

If you want an "AWS native" approach, you have 2 services to choose from. As the name suggests, Secrets Manager provides some secrets management tools on top of the store. This includes automagic rotation of AWS RDS credentials on a regular schedule. For the first 30 days the service is free, then you start paying per secret per month, plus API calls.

There is a free option, Amazon's Systems Manager Parameter Store. This is what I'll be covering today.

Structure

It is easy when you first start out to store all your secrets at the top level. After a while you will regret this decision. 

Parameter Store supports hierarchies. I recommend using them from day one. Today I generally use /[appname]-[env]/[KEY]. After some time with this scheme I am finding that /[appname]/[env]/[KEY] feels like it will be easier to manage. IAM permissions support paths and wildcards, so either scheme will work.

If you need to migrate your secrets, use Parameter Store namespace migration script

Access Controls

Like most Amazon services IAM controls access to Parameter Store. 

Parameter Store allows you to store your values as plain text or encrypted using a key using KMS. For encrypted values the user must have have grants on the parameter store value and KMS key. For consistency I recommend encrypting all your parameters.

If you have a monolith a key per application per envionment is likely to work well. If you have a collection of microservices having a key per service per environment becomes difficult to manage. In this case share a key between several services in the same environment.

Here is an IAM policy for an Lambda function to access a hierarchy of values in parameter store:

{   "Version":"2012-10-17",   "Statement":[     {       "Sid":"ReadParams",       "Effect":"Allow",       "Action":[         "ssm:GetParametersByPath"       ],       "Resource":"arn:aws:ssm:us-east-1:1234567890:parameter/my-app/dev/*"     },     {       "Sid":"Decrypt",       "Effect":"Allow",       "Action":[         "kms:Decrypt"       ],       "Resource":"arn:aws:kms:us-east-1:1234567890:key/20180823-7311-4ced-bad5-653587846973"     }   ] }

To allow your developers to manage the parameters in dev you will need a policy that looks like this:

{   "Version":"2012-10-17",   "Statement":[     {       "Sid":"ManageParams",       "Effect":"Allow",       "Action":[         "ssm:DeleteParameter",         "ssm:DeleteParameters",         "ssm:GetParameter",         "ssm:GetParameterHistory",         "ssm:GetParametersByPath",         "ssm:GetParameters",         "ssm:PutParameter"       ],       "Resource":"arn:aws:ssm:us-east-1:1234567890:parameter/my-app/dev/*"     },     {       "Sid":"ListParams",       "Effect":"Allow",       "Action":"ssm:DescribeParameters",       "Resource":"*"     },     {       "Sid":"DecryptEncrypt",       "Effect":"Allow",       "Action":[         "kms:Decrypt",         "kms:Encrypt"       ],       "Resource":"arn:aws:kms:us-east-1:1234567890:key/20180823-7311-4ced-bad5-653587846973"     }   ] }

Amazon has great documentation on controlling access to Parameter Store and KMS.

Adding Parameters

Amazon allows you to store almost any string up to 4Kbs in length in the Parameter store. This gives you a lot of flexibility.

Parameter Store supports deep hierarchies. You will find this becomes annoying to manage. Use hierarchies to group your values by application and environment. Within the heirarchy use a flat structure. I recommend using lower case letters with dashes between words for your paths. For the parameter keys use upper case letters with underscores. This makes it easy to differentiate the two when searching for parameters. 

Parameter store encodes everything as strings. There may be cases where you want to store an integer as an integer or a more complex data structure. You could use a naming convention to differentiate your different types. I found it easiest to encode every thing as json. When pulling values from the store I json decode it. The down side is strings must be wrapped in double quotes. This is offset by the flexibility of being able to encode objects and use numbers.

It is possible to add parameters to the store using 3 different methods. I generally find the AWS web console easiest when adding a small number of entries. Rather than walking you through this, Amazon have good documentation on adding values. Remember to always use "secure string" to encrypt your values.

Adding parameters via boto3 is straight forward. Once again it is well documented by Amazon.

Finally you can maintain parameters in with a little bit of code. In this example I do it with Python.

import boto3 namespace = "my-app" env = "dev" kms_uuid = "20180823-7311-4ced-bad5-653587846973" # Objects must be json encoded then wrapped in quotes because they're stored as strings. parameters = {"key": '"value"', "MY_INT": 1234, "MY_OBJ": '{"name": "value"}'} ssm = boto3.client("ssm") for parameter in parameters: ssm.put_parameter( Name=f"/{namespace}/{env}/{parameter.upper()}", # Everything must go in as a string. Value=str(parameters[parameter]), Type="SecureString", KeyId=kms_uuid, # Use with caution. Overwrite=True, ) Using Parameters

I have used Parameter Store from Python and the command line. It is easier to use it from Python.

My example assumes that it a Lambda function running with the policy from earlier. The function is called my-app-dev. This is what my code looks like:

import json import boto3 def load_params(namespace: str, env: str) -> dict:     """Load parameters from SSM Parameter Store.     :namespace: The application namespace.     :env: The current application environment.     :return: The config loaded from Parameter Store.     """     config = {}     path = f"/{namespace}/env/"     ssm = boto3.client("ssm", region_name="us-east-1")     more = None     args = {"Path": path, "Recursive": True, "WithDecryption": True}     while more is not False:         if more:             args["NextToken"] = more         params = ssm.get_parameters_by_path(**args)         for param in params["Parameters"]:             key = param["Name"].split("/")[3]             config[key] = json.loads(param["Value"])         more = params.get("NextToken", False)     return config

If you want to avoid loading your config each time your Lambda function is called you can store the results in a global variable. This leverages Amazon's feature that doesn't clear global variables between function invocations. The catch is that your function won't pick up parameter changes without a code deployment. Another option is to put in place logic for periodic purging of the cache.

On the command line things are little harder to manage if you have more than 10 parameters. To export a small number of entries as environment variables, you can use this one liner:

$(aws ssm get-parameters-by-path --with-decryption --path /my-app/dev/ | jq -r '.Parameters[] | "export " + (.Name | split("/")[3] | ascii_upcase | gsub("-"; "_")) + "=" + .Value + ";"')

Make sure you have jq installed and the AWS cli installed and configured.

Conclusion

Amazon's System Manager Parameter Store provides a secure way of storing and managing secrets for your AWS based apps. Unlike Hashicorp Vault, Amazon manages everything for you. If you don't need the more advanced features of Secrets Manager you don't have to pay for them. For most users Parameter Store will be adequate.

Michael Still: Learning from the mistakes that even big projects make

Fri, 2018-08-24 16:00

The following is a blog post version of a talk presented at pyconau 2018. Slides for the presentation can be found here (as Microsoft powerpoint, or as PDF), and a video of the talk (thanks NextDayVideo!) is below:

 

OpenStack is an orchestration system for setting up virtual machines and associated other virtual resources such as networks and storage on clusters of computers. At a high level, OpenStack is just configuring existing facilities of the host operating system — there isn’t really a lot of difference between OpenStack and a room full of system admins frantically resolving tickets requesting virtual machines be setup. The only real difference is scale and predictability.

To do its job, OpenStack needs to be able to manipulate parts of the operating system which are normally reserved for administrative users. This talk is the story of how OpenStack has done that thing over time, what we learnt along the way, and what I’d do differently if I had my time again. Lots of systems need to do these things, so even if you never use OpenStack hopefully there are things to be learnt here.

That said, someone I respect suggested last weekend that good conference talks are actionable. A talk full of OpenStack war stories isn’t actionable, so I’ve spent the last week re-writing this talk to hopefully be more of a call to action than just an interesting story. I apologise for any mismatch between the original proposal and what I present here that might therefore exist.Back to the task in hand though — providing control of virtual resources to untrusted users. OpenStack has gone through several iterations of how it thinks this should be done, so perhaps its illustrative to start by asking how other similar systems achieve this. There are lots of systems that have a requirement to configure privileged parts of the host operating system. The most obvious example I can think of is Docker. How does Docker do this? Well… its actually not all that pretty. Docker presents its API over a unix domain socket by default in order to limit control to local users (you can of course configure this). So to provide access to Docker, you add users to the docker group, which owns that domain socket. The Docker documentation warns that “the docker group grants privileges equivalent to the root user“. So that went well.

Docker is really an example of the simplest way of solving this problem — by not solving it at all. That works well enough for systems where you can tightly control the users who need access to those privileged operations — in Docker’s case by making them have an account in the right group on the system and logging in locally. However, OpenStack’s whole point is to let untrusted remote users create virtual machines, so we’re going to have to do better than that.

The next level up is to do something with sudo. The way we all use sudo day to day, you allow users in the sudoers group to become root and execute any old command, with a configuration entry that probably looks a little like this:

# Allow members of group sudo to execute any command %sudo ALL=(ALL:ALL) ALL

Now that config entry is basically line noise, but it says “allow members of the group called sudo, on any host, to run any command as root”. You can of course embed this into your python code using subprocess.call() or similar. On the security front, its possible to do a little bit better than a “nova can execute anything” entry. For example:

%sudo ALL=/bin/ls

This says that the sudo group on all hosts can execute /bin/ls with any arguments. OpenStack never actually specified the complete list of commands it executed. That was left as a job for packagers, which of course meant it wasn’t done well.

So there’s our first actionable thing — if you assume that someone else (packagers, the ops team, whoever) is going to analyse your code well enough to solve the security problem that you can’t be bothered solving, then you have a problem. Now, we weren’t necessarily deliberately punting here. Its obvious to me how to grep the code for commands run as root to add them to a sudo configuration file, but that’s unfair. I wrote some of this code, I am much closer to it than a system admin who just wants to get the thing deployed.

We can of course do better than just raw sudo. Next we tried a thing called rootwrap, which was mostly an attempt to provide a better boundary around exactly what commands you can expect an OpenStack binary to execute. So for example, maybe its ok for me to read the contents of a configuration file specific to a virtual machine I am managing, but I probably shouldn’t be able to read /etc/shadow or whatever. We can do that by doing something like this:

sudo nova-rootwrap /etc/nova/rootwrap.conf /bin/ls /etc

Where nova-rootwrap is a program which takes a configuration file and a command line to run. The contents of the configuration file are used to determine if the command line should be executed.

Now we can limit the sudo configuration file to only needing to be able to execute nova-rootwrap. I thought about putting in a whole bunch of slides about exactly how to configure rootwrap, but then I realised that this talk is only 25 minutes and you can totally google that stuff.

So instead, here’s my second actionable thing… Is there a trivial change you can make which will dramatically improve security? I don’t think anyone would claim that rootwrap is rocket science, but it improved things a lot — deployers didn’t need to grep out the command lines we executed any more, and we could do things like specify what paths we were allowed to do things in. Are there similarly trivial changes that you can make to improve your world?

But wait! Here’s my third actionable thing as well — what are the costs of your design? Some of these are obvious — for example with this design executing something with escalated permissions causes us to pay to fork a process. In fact its worse with rootwrap, because we pay to fork, start a python interpreter to parse a configuration file, and then fork again for the actual binary we wanted in the first place. That cost adds up if you need to execute many small commands, for example when plugging in a new virtual network interface. At one point we measured this for network interfaces and the costs were in the tens of seconds per interface.

There is another cost though which I think is actually more important. The only way we have with this mechanism to do something with escalated permissions is to execute it as a separate process. This is a horrible interface and forces us to do some really weird things. Let’s checkout some examples…

Which of the following commands are reasonable?

shred –n3 –sSIZE PATH touch PATH rm –rf PATH mkdir –p PATH

These are just some examples, there are many others. The first is probably the most reasonable. It doesn’t seem wise to me for us to implement our own data shredding code, so using a system command for that seems reasonable. The other examples are perhaps less reasonable — the rm one is particularly scary to me. But none of these are the best example…

How about this one?

utils.execute('tee',               ('/sys/class/net/%s/bridge/multicast_snooping' %                br_name),               process_input='0',               run_as_root=True,               check_exit_code=[0, 1])

Some commentary first. This code existed in the middle of a method that does other things. Its one of five command lines that method executes. What does it do?

Its actually not too bad. Using root permissions, it writes a zero to the multicast_snooping sysctl for the network bridge being setup. It then checks the exit code and raises an exception if its not 0 or 1.

That said, its also horrid. In order to write a single byte to a sysctl as root, we are forced to fork, start a python process, read a configuration file, and then fork again. For an operation that in some situations might need to happen hundreds of times for OpenStack to restart on a node.

This is how we get to the third way that OpenStack does escalated permissions. If we could just write python code that ran as root, we could write this instead:

with open(('/sys/class/net/%s/bridge/multicast_snooping' %            br_name), 'w') as f: f.write('0')

Its not perfect, but its a lot cheaper to execute and we could put it in a method with a helpful name like “disable multicast snooping” for extra credit. Which brings us to…

Hire Angus Lees and make him angry. Angus noticed this problem well before the rest of us. We were all lounging around basking in our own general cleverness. What Angus proposed is that instead of all this forking and parsing and general mucking around, that we just start a separate process at startup with special permissions, and then send it commands to execute.

He could have done that with a relatively horrible API, for example just sending command lines down the pipe and getting their responses back to parse, but instead he implemented a system of python decorators which let us call a method which is marked up as saying “I want to run as root!”.

So here’s the destination in our journey, how we actually do that thing in OpenStack now:

@nova.privsep.sys_admin_pctxt.entrypoint def disable_multicast_snooping(bridge): path = ('/sys/class/net/%s/bridge/multicast_snooping' %             bridge) if not os.path.exists(path): raise exception.FileNotFound(file_path=path) with open(path, 'w') as f: f.write('0')

The decorator before the method definition is a bit opaque, but basically says “run this thing as root”, and the rest is a method which can be called from anywhere within our code.

There are a few things you need to do to setup privsep, but I don’t have time in this talk to discuss the specifics. Effectively you need to arrange for the privsep helper to start with escalated permissions, and you need to move the code which will run with one of these decorators to a sub path of your source tree to stop other code from accidentally being escalated. privsep is also capable of running with more than one set of permissions — it will start a helper for each set. That’s what this decorator is doing, specifying what permissions we need for this method.

And here we land at my final actionable thing. Make it easy to do the  right thing, and hard to do the wrong thing. Rusty Russell used to talk about this at linux.conf.au when he was going through a phase of trying to clean up kernel APIs — its important that your interfaces make it obvious how to use them correctly, and make it hard to use them incorrectly.

In the example used for this talk, having command lines executed as root meant that the prevalent example of how to do many things was a command line. So people started doing that even when they didn’t need escalated permissions — for example calling mkdir instead of using our helper function to recursively make a set of directories.

We’ve cleaned that up, but we’ve also made it much much harder to just drop a command line into our code base to run as root, which will hopefully stop some of this problem re-occuring in the future. I don’t think OpenStack has reached perfection in this regard yet, but we continue to improve a little each day and that’s probably all we can hope for.

privsep can be used for non-OpenStack projects too. There’s really nothing specific about most of OpenStack’s underlying libraries in fact, and there’s probably things there which are useful to you. In fact the real problem is working out what is where because there’s so much of it.

One final thing — privsep makes it possible to specify the exact permissions needed to do something. For example, setting up a network bridge probably doesn’t need “read everything on the filesystem” permissions. We originally did that, but stepped back to using a singled escalated permissions set that maps to what you get with sudo, because working out what permissions a single operation needed was actually quite hard. We were trying to lower the barrier for entry for doing things the right way. I don’t think I really have time to dig into that much more here, but I’d be happy to chat about it sometime this weekend or on the Internet later.

So in summary:

  • Don’t assume someone else will solve the problem for you.
  • Are there trivial changes you can make that will drastically improve security?
  • Think about the costs of your design.
  • Hire smart people and let them be annoyed about things that have always “just been than way”. Let them fix those things.
  • Make it easy to do things the right way and hard to do things the wrong way.

I’d be happy to help people get privsep into their code, and its very usable outside of OpenStack. There are a couple of blog posts about that on my site at http://www.madebymikal.com/?s=privsep, but feel free to contact me at mikal@stillhq.com if you’d like to chat.

Julien Goodwin: Custom output pods for the Standard Research CG635 Clock Generator

Thu, 2018-08-23 18:03
As part of my previously mentioned side project the ability to replace crystal oscillators in a circuit with a higher quality frequency reference is really handy, to let me eliminate a bunch of uncertainty from some test setups.

A simple function generator is the classic way to handle this, although if you need square wave output it quickly gets hard to find options, with arbitrary waveform generators (essentially just DACs) the common option. If you can get away with just sine wave output an RF synthesizer is the other main option.

While researching these I discovered the CG635 Clock Generator from Stanford Research, and some time later picked one of these up used.

As well as being a nice square wave generator at arbitrary voltages these also have another set of outputs on the rear of the unit on an 8p8c (RJ45) connector, in both RS422 (for lower frequencies) and LVDS (full range) formats, as well as some power rails to allow a variety of less common output formats.

All I needed was 1.8v LVCMOS output, and could get that from the front panel output, but I'd then need a coax tail on my boards, as well as potentially running into voltage rail issues so I wanted to use the pod output instead. Unfortunately none of the pods available from Stanford Research do LVCMOS output, so I'd have to make my own, which I did.

The key chip in my custom pod is the TI SN65LVDS4, a 1.8v capable single channel LVDS reciever that operates at the frequencies I need. The only downside is this chip is only available in a single form factor, a 1.5mm x 2mm 10 pin UQFN, which is far too small to hand solder with an iron. The rest of the circuit is just some LED indicators to signal status.


Here's a rendering of the board from KiCad.

Normally "not hand solderable" for me has meant getting the board assembled, however my normal assembly house doesn't offer custom PCB finishes, and I wanted these to have white solder mask with black silkscreen as a nice UX when I go to use them, so instead I decided to try my hand at skillet reflow as it's a nice option given the space I've got in my tiny apartment (the classic tutorial on this from SparkFun is a good read if you're interested). Instead of just a simple plate used for cooking you can now buy hot plates with what are essentially just soldering iron temperature controllers, sold as pre-heaters making it easier to come close to a normal soldering profile.

Sadly, actually acquiring the hot plate turned into a bit of a mess, the first one I ordered in May never turned up, and it wasn't until mid-July that one arrived from a different supplier.

Because of the aforementioned lack of space instead of using stencils I simply hand-applied (leaded) paste, without even an assist tool (which I probably will acquire for next time), then hand-mounted the components, and dropped them on the plate to reflow. I had one resistor turn 90 degrees, and a few bridges from excessive paste, but for a first attempt I was really happy.


Here's a photo of the first two just after being taken off the hot plate.

Once the reflow was complete it was time to start testing, and this was where I ran into my biggest problems.

The two big problems were with the power supply I was using, and with my oscilloscope.

The power supply (A Keithley 228 Voltage/Current source) is from the 80's (Keithley's "BROWN" era), and while it has nice specs, doesn't have the most obvious UI. Somehow I'd set it to limit at 0ma output current, and if you're not looking at the segment lights it's easy to miss. At work I have an EEZ H24005 which also resets the current limit to zero on clear, however it's much more obvious when limiting, and a power supply with that level of UX is now on my "to buy" list.

The issues with my scope were much simpler. Currently I only have an old Rigol DS1052E scope, and while it works fine it is a bit of a pain to use, but ultimately I made a very simple mistake while testing. I was feeding in a trigger signal direct from the CG635's front outputs, and couldn't figure out why the generator was putting out such a high voltage (implausibly so). To cut the story short, I'd simply forgotten that the scope was set for use with 10x probes, and once I realised that everything made rather more sense. An oscilloscope with auto-detection for 10x probes, as well as a bunch of other features I want in a scope (much bigger screen for one), has now been ordered, but won't arrive for a while yet.

Ultimately the boards work fine, but until the new scope arrives I can't determine signal quality of them, but at least they're ready for when I'll need them, which is great for flow.

Gary Pendergast: Trying Mastodon

Mon, 2018-08-20 10:04

It’s no secret that Twitter is red hot garbage fire, so I’ve signed up for a Mastodon account to give them a try. Because I’m super vain, I decided to create my own Mastodon instance, with a custom domain.

Gary (@gary@pento.net)

7 Toots, 16 Following, 52 Followers · Web and music geek. I write poetry for WordPress.

Gary’s Mastadon

Mastodon is kind of weird to sign up for: think of it as being kind of like email. You can sign up for a big provider like Gmail or Outlook, you can run your own server, or you can pay someone to run a server for you. In my case, I’m paying for a personal account on Masto.host, they’ve been super helpful in getting it all configured. If you’re just looking to try it out for free, here’s a tool to help you choose an instance.

Some Initial Impressions

It’s pretty quiet, of course. I’m currently following 13 people on Mastodon, and 500 on Twitter, there’s going to be a difference in volume.

Mastodon Bridge is a useful tool for you to be able to find your Twitter friends when you start out. I highly recommend using it.

None of the apps have the polish that I’m used to with Twitter apps like Twitterific and Fenix, but they’re evolving quickly. I do miss Tweet Marker support. I’ve settled on using Whalebird for my MacOS app, none of the various Android apps have appealed enough for me to start using them.

Search is pretty bad. There’s a confusing limitation (for a reasonable, but not really satisfactory technical reason): You can only search through the Toots that people on the same instance as you have subscribed to. So, because I’m on only person on pento.net, I can only search through the Toots of people I’m immediately subscribed to. If you sign up for a big host, you’ll see many more results, but you still won’t see everything.

I’m going to have to share this post manually, because Jetpack doesn’t know how to share to Mastodon yet.

David Rowe: How Fourier Transforms Work

Thu, 2018-08-16 10:04

The best explanation of the Discrete Fourier Transform (DFT) I have ever seen from Bill Cowley on his Low SNR blog.

sthbrx - a POWER technical blog: Improving performance of Phoronix benchmarks on POWER9

Wed, 2018-08-15 15:22

Recently Phoronix ran a range of benchmarks comparing the performance of our POWER9 processor against the Intel Xeon and AMD EPYC processors.

We did well in the Stockfish, LLVM Compilation, Zstd compression, and the Tinymembench benchmarks. A few of my colleagues did a bit of investigating into some the benchmarks where we didn't perform quite so well.

LBM / Parboil

The Parboil benchmarks are a collection of programs from various scientific and commercial fields that are useful for examining the performance and development of different architectures and tools. In this round of benchmarks Phoronix used the lbm benchmark: a fluid dynamics simulation using the Lattice-Boltzmann Method.

lbm is an iterative algorithm - the problem is broken down into discrete time steps, and at each time step a bunch of calculations are done to simulate the change in the system. Each time step relies on the results of the previous one.

The benchmark uses OpenMP to parallelise the workload, spreading the calculations done in each time step across many CPUs. The number of calculations scales with the resolution of the simulation.

Unfortunately, the resolution (and therefore the work done in each time step) is too small for modern CPUs with large numbers of SMT (simultaneous multi-threading) threads. OpenMP doesn't have enough work to parallelise and the system stays relatively idle. This means the benchmark scales relatively poorly, and is definitely not making use of the large POWER9 system

Also this benchmark is compiled without any optimisation. Recompiling with -O3 improves the results 3.2x on POWER9.

x264 Video Encoding

x264 is a library that encodes videos into the H.264/MPEG-4 format. x264 encoding requires a lot of integer kernels doing operations on image elements. The math and vectorisation optimisations are quite complex, so Nick only had a quick look at the basics. The systems and environments (e.g. gcc version 8.1 for Skylake, 8.0 for POWER9) are not completely apples to apples so for now patterns are more important than the absolute results. Interestingly the output video files between architectures are not the same, particularly with different asm routines and compiler options used, which makes it difficult to verify the correctness of any changes.

All tests were run single threaded to avoid any SMT effects.

With the default upstream build of x264, Skylake is significantly faster than POWER9 on this benchmark (Skylake: 9.20 fps, POWER9: 3.39 fps). POWER9 contains some vectorised routines, so an initial suspicion is that Skylake's larger vector size may be responsible for its higher throughput.

Let's test our vector size suspicion by restricting Skylake to SSE4.2 code (with 128 bit vectors, the same width as POWER9). This hardly slows down the x86 CPU at all (Skylake: 8.37 fps, POWER9: 3.39 fps), which indicates it's not taking much advantage of the larger vectors.

So the next guess would be that x86 just has more and better optimized versions of costly functions (in the version of x264 that Phoronix used there are only six powerpc specific files compared with 21 x86 specific files). Without the time or expertise to dig into the complex task of writing vector code, we'll see if the compiler can help, and turn on autovectorisation (x264 compiles with -fno-tree-vectorize by default, which disables auto vectorization). Looking at a perf profile of the benchmark we can see that one costly function, quant_4x4x4, is not autovectorised. With a small change to the code, gcc does vectorise it, giving a slight speedup with the output file checksum unchanged (Skylake: 9.20 fps, POWER9: 3.83 fps).

We got a small improvement with the compiler, but it looks like we may have gains left on the table with our vector code. If you're interested in looking into this, we do have some active bounties for x264 (lu-zero/x264).

Test Skylake POWER9 Original - AVX256 9.20 fps 3.39 fps Original - SSE4.2 8.37 fps 3.39 fps Autovectorisation enabled, quant_4x4x4 vectorised 9.20 fps 3.83 fps

Nick also investigated running this benchmark with SMT enabled and across multiple cores, and it looks like the code is not scalable enough to feed 176 threads on a 44 core system. Disabling SMT in parallel runs actually helped, but there was still idle time. That may be another thing to look at, although it may not be such a problem for smaller systems.

Primesieve

Primesieve is a program and C/C++ library that generates all the prime numbers below a given number. It uses an optimised Sieve of Eratosthenes implementation.

The algorithm uses the L1 cache size as the sieve size for the core loop. This is an issue when we are running in SMT mode (aka more than one thread per core) as all threads on a core share the same L1 cache and so will constantly be invalidating each others cache-lines. As you can see in the table below, running the benchmark in single threaded mode is 30% faster than in SMT4 mode!

This means in SMT-4 mode the workload is about 4x too large for the L1 cache. A better sieve size to use would be the L1 cache size / number of threads per core. Anton posted a pull request to update the sieve size.

It is interesting that the best overall performance on POWER9 is with the patch applied and in SMT2 mode:

SMT level baseline patched 1 14.728s 14.899s 2 15.362s 14.040s 4 19.489s 17.458s LAME

Despite its name, a recursive acronym for "LAME Ain't an MP3 Encoder", LAME is indeed an MP3 encoder.

Due to configure options not being parsed correctly this benchmark is built without any optimisation regardless of architecture. We see a massive speedup by turning optimisations on, and a further 6-8% speedup by enabling USE_FAST_LOG (which is already enabled for Intel).

LAME Duration Speedup Default 82.1s n/a With optimisation flags 16.3s 5.0x With optimisation flags and USE_FAST_LOG set 15.6s 5.3x

For more detail see Joel's writeup.

FLAC

FLAC is an alternative encoding format to MP3. But unlike MP3 encoding it is lossless! The benchmark here was encoding audio files into the FLAC format.

The key part of this workload is missing vector support for POWER8 and POWER9. Anton and Amitay submitted this patch series that adds in POWER specific vector instructions. It also fixes the configuration options to correctly detect the POWER8 and POWER9 platforms. With this patch series we get see about a 3x improvement in this benchmark.

OpenSSL

OpenSSL is among other things a cryptographic library. The Phoronix benchmark measures the number of RSA 4096 signs per second:

$ openssl speed -multi $(nproc) rsa4096

Phoronix used OpenSSL-1.1.0f, which is almost half as slow for this benchmark (on POWER9) than mainline OpenSSL. Mainline OpenSSL has some powerpc multiplication and squaring assembly code which seems to be responsible for most of this speedup.

To see this for yourself, add these four powerpc specific commits on top of OpenSSL-1.1.0f:

  1. perlasm/ppc-xlate.pl: recognize .type directive
  2. bn/asm/ppc-mont.pl: prepare for extension
  3. bn/asm/ppc-mont.pl: add optimized multiplication and squaring subroutines
  4. ppccap.c: engage new multipplication and squaring subroutines

The following results were from a dual 16-core POWER9:

Version of OpenSSL Signs/s Speedup 1.1.0f 1921 n/a 1.1.0f with 4 patches 3353 1.74x 1.1.1-pre1 3383 1.76x SciKit-Learn

SciKit-Learn is a bunch of python tools for data mining and analysis (aka machine learning).

Joel noticed that the benchmark spent 92% of the time in libblas. Libblas is a very basic BLAS (basic linear algebra subprograms) library that python-numpy uses to do vector and matrix operations. The default libblas on Ubuntu is only compiled with -O2. Compiling with -Ofast and using alternative BLAS's that have powerpc optimisations (such as libatlas or libopenblas) we see big improvements in this benchmark:

BLAS used Duration Speedup libblas -O2 64.2s n/a libblas -Ofast 36.1s 1.8x libatlas 8.3s 7.7x libopenblas 4.2s 15.3x

You can read more details about this here.

Blender

Blender is a 3D graphics suite that supports image rendering, animation, simulation and game creation. On the surface it appears that Blender 2.79b (the distro package version that Phoronix used by system/blender-1.0.2) failed to use more than 15 threads, even when "-t 128" was added to the Blender command line.

It turns out that even though this benchmark was supposed to be run on CPUs only (you can choose to render on CPUs or GPUs), the GPU file was always being used. The GPU file is configured with a very large tile size (256x256) - which is fine for GPUs but not great for CPUs. The image size (1280x720) to tile size ratio limits the number of jobs created and therefore the number threads used.

To obtain a realistic CPU measurement with more that 15 threads you can force the use of the CPU file by overwriting the GPU file with the CPU one:

$ cp ~/.phoronix-test-suite/installed-tests/system/blender-1.0.2/benchmark/pabellon_barcelona/pavillon_barcelone_cpu.blend ~/.phoronix-test-suite/installed-tests/system/blender-1.0.2/benchmark/pabellon_barcelona/pavillon_barcelone_gpu.blend

As you can see in the image below, now all of the cores are being utilised!

Fortunately this has already been fixed in pts/blender-1.1.1. Thanks to the report by Daniel it has also been fixed in system/blender-1.1.0.

Pinning the pts/bender-1.0.2, Pabellon Barcelona, CPU-Only test to a single 22-core POWER9 chip (sudo ppc64_cpu --cores-on=22) and two POWER9 chips (sudo ppc64_cpu --cores-on=44) show a huge speedup:

Benchmark Duration (deviation over 3 runs) Speedup Baseline (GPU blend file) 1509.97s (0.30%) n/a Single 22-core POWER9 chip (CPU blend file) 458.64s (0.19%) 3.29x Two 22-core POWER9 chips (CPU blend file) 241.33s (0.25%) 6.25x tl;dr

Some of the benchmarks where we don't perform as well as Intel are where the benchmark has inline assembly for x86 but uses generic C compiler generated assembly for POWER9. We could probably benefit with some more powerpc optimsed functions.

We also found a couple of things that should result in better performance for all three architectures, not just POWER.

A summary of the performance improvements we found:

Benchmark Approximate Improvement Parboil 3x x264 1.1x Primesieve 1.1x LAME 5x FLAC 3x OpenSSL 2x SciKit-Learn 7-15x Blender 3x

There is obviously room for more improvements, especially with the Primesieve and x264 benchmarks, but it would be interesting to see a re-run of the Phoronix benchmarks with these changes.

Thanks to Anton, Daniel, Joel and Nick for the analysis of the above benchmarks.

sthbrx - a POWER technical blog: Improving performance of Phoronix benchmarks on POWER9

Wed, 2018-08-15 15:22

Recently Phoronix ran a range of benchmarks comparing the performance of our POWER9 processor against the Intel Xeon and AMD EPYC processors.

We did well in the Stockfish, LLVM Compilation, Zstd compression, and the Tinymembench benchmarks. A few of my colleagues did a bit of investigating into some the benchmarks where we didn't perform quite so well.

LBM / Parboil

The Parboil benchmarks are a collection of programs from various scientific and commercial fields that are useful for examining the performance and development of different architectures and tools. In this round of benchmarks Phoronix used the lbm benchmark: a fluid dynamics simulation using the Lattice-Boltzmann Method.

lbm is an iterative algorithm - the problem is broken down into discrete time steps, and at each time step a bunch of calculations are done to simulate the change in the system. Each time step relies on the results of the previous one.

The benchmark uses OpenMP to parallelise the workload, spreading the calculations done in each time step across many CPUs. The number of calculations scales with the resolution of the simulation.

Unfortunately, the resolution (and therefore the work done in each time step) is too small for modern CPUs with large numbers of SMT (simultaneous multi-threading) threads. OpenMP doesn't have enough work to parallelise and the system stays relatively idle. This means the benchmark scales relatively poorly, and is definitely not making use of the large POWER9 system

Also this benchmark is compiled without any optimisation. Recompiling with -O3 improves the results 3.2x on POWER9.

x264 Video Encoding

x264 is a library that encodes videos into the H.264/MPEG-4 format. x264 encoding requires a lot of integer kernels doing operations on image elements. The math and vectorisation optimisations are quite complex, so Nick only had a quick look at the basics. The systems and environments (e.g. gcc version 8.1 for Skylake, 8.0 for POWER9) are not completely apples to apples so for now patterns are more important than the absolute results. Interestingly the output video files between architectures are not the same, particularly with different asm routines and compiler options used, which makes it difficult to verify the correctness of any changes.

All tests were run single threaded to avoid any SMT effects.

With the default upstream build of x264, Skylake is significantly faster than POWER9 on this benchmark (Skylake: 9.20 fps, POWER9: 3.39 fps). POWER9 contains some vectorised routines, so an initial suspicion is that Skylake's larger vector size may be responsible for its higher throughput.

Let's test our vector size suspicion by restricting Skylake to SSE4.2 code (with 128 bit vectors, the same width as POWER9). This hardly slows down the x86 CPU at all (Skylake: 8.37 fps, POWER9: 3.39 fps), which indicates it's not taking much advantage of the larger vectors.

So the next guess would be that x86 just has more and better optimized versions of costly functions (in the version of x264 that Phoronix used there are only six powerpc specific files compared with 21 x86 specific files). Without the time or expertise to dig into the complex task of writing vector code, we'll see if the compiler can help, and turn on autovectorisation (x264 compiles with -fno-tree-vectorize by default, which disables auto vectorization). Looking at a perf profile of the benchmark we can see that one costly function, quant_4x4x4, is not autovectorised. With a small change to the code, gcc does vectorise it, giving a slight speedup with the output file checksum unchanged (Skylake: 9.20 fps, POWER9: 3.83 fps).

We got a small improvement with the compiler, but it looks like we may have gains left on the table with our vector code. If you're interested in looking into this, we do have some active bounties for x264 (lu-zero/x264).

Test Skylake POWER9 Original - AVX256 9.20 fps 3.39 fps Original - SSE4.2 8.37 fps 3.39 fps Autovectorisation enabled, quant_4x4x4 vectorised 9.20 fps 3.83 fps

Nick also investigated running this benchmark with SMT enabled and across multiple cores, and it looks like the code is not scalable enough to feed 176 threads on a 44 core system. Disabling SMT in parallel runs actually helped, but there was still idle time. That may be another thing to look at, although it may not be such a problem for smaller systems.

Primesieve

Primesieve is a program and C/C++ library that generates all the prime numbers below a given number. It uses an optimised Sieve of Eratosthenes implementation.

The algorithm uses the L1 cache size as the sieve size for the core loop. This is an issue when we are running in SMT mode (aka more than one thread per core) as all threads on a core share the same L1 cache and so will constantly be invalidating each others cache-lines. As you can see in the table below, running the benchmark in single threaded mode is 30% faster than in SMT4 mode!

This means in SMT-4 mode the workload is about 4x too large for the L1 cache. A better sieve size to use would be the L1 cache size / number of threads per core. Anton posted a pull request to update the sieve size.

It is interesting that the best overall performance on POWER9 is with the patch applied and in SMT2 mode:

SMT level baseline patched 1 14.728s 14.899s 2 15.362s 14.040s 4 19.489s 17.458s LAME

Despite its name, a recursive acronym for "LAME Ain't an MP3 Encoder", LAME is indeed an MP3 encoder.

Due to configure options not being parsed correctly this benchmark is built without any optimisation regardless of architecture. We see a massive speedup by turning optimisations on, and a further 6-8% speedup by enabling USE_FAST_LOG (which is already enabled for Intel).

LAME Duration Speedup Default 82.1s n/a With optimisation flags 16.3s 5.0x With optimisation flags and USE_FAST_LOG set 15.6s 5.3x

For more detail see Joel's writeup.

FLAC

FLAC is an alternative encoding format to MP3. But unlike MP3 encoding it is lossless! The benchmark here was encoding audio files into the FLAC format.

The key part of this workload is missing vector support for POWER8 and POWER9. Anton and Amitay submitted this patch series that adds in POWER specific vector instructions. It also fixes the configuration options to correctly detect the POWER8 and POWER9 platforms. With this patch series we get see about a 3x improvement in this benchmark.

OpenSSL

OpenSSL is among other things a cryptographic library. The Phoronix benchmark measures the number of RSA 4096 signs per second:

$ openssl speed -multi $(nproc) rsa4096

Phoronix used OpenSSL-1.1.0f, which is almost half as slow for this benchmark (on POWER9) than mainline OpenSSL. Mainline OpenSSL has some powerpc multiplication and squaring assembly code which seems to be responsible for most of this speedup.

To see this for yourself, add these four powerpc specific commits on top of OpenSSL-1.1.0f:

  1. perlasm/ppc-xlate.pl: recognize .type directive
  2. bn/asm/ppc-mont.pl: prepare for extension
  3. bn/asm/ppc-mont.pl: add optimized multiplication and squaring subroutines
  4. ppccap.c: engage new multipplication and squaring subroutines

The following results were from a dual 16-core POWER9:

Version of OpenSSL Signs/s Speedup 1.1.0f 1921 n/a 1.1.0f with 4 patches 3353 1.74x 1.1.1-pre1 3383 1.76x SciKit-Learn

SciKit-Learn is a bunch of python tools for data mining and analysis (aka machine learning).

Joel noticed that the benchmark spent 92% of the time in libblas. Libblas is a very basic BLAS (basic linear algebra subprograms) library that python-numpy uses to do vector and matrix operations. The default libblas on Ubuntu is only compiled with -O2. Compiling with -Ofast and using alternative BLAS's that have powerpc optimisations (such as libatlas or libopenblas) we see big improvements in this benchmark:

BLAS used Duration Speedup libblas -O2 64.2s n/a libblas -Ofast 36.1s 1.8x libatlas 8.3s 7.7x libopenblas 4.2s 15.3x

You can read more details about this here.

Blender

Blender is a 3D graphics suite that supports image rendering, animation, simulation and game creation. On the surface it appears that Blender 2.79b (the distro package version that Phoronix used by system/blender-1.0.2) failed to use more than 15 threads, even when "-t 128" was added to the Blender command line.

It turns out that even though this benchmark was supposed to be run on CPUs only (you can choose to render on CPUs or GPUs), the GPU file was always being used. The GPU file is configured with a very large tile size (256x256) - which is fine for GPUs but not great for CPUs. The image size (1280x720) to tile size ratio limits the number of jobs created and therefore the number threads used.

To obtain a realistic CPU measurement with more that 15 threads you can force the use of the CPU file by overwriting the GPU file with the CPU one:

$ cp ~/.phoronix-test-suite/installed-tests/system/blender-1.0.2/benchmark/pabellon_barcelona/pavillon_barcelone_cpu.blend ~/.phoronix-test-suite/installed-tests/system/blender-1.0.2/benchmark/pabellon_barcelona/pavillon_barcelone_gpu.blend

As you can see in the image below, now all of the cores are being utilised!

Fortunately this has already been fixed in pts/blender-1.1.1. Thanks to the report by Daniel it has also been fixed in system/blender-1.1.0.

Pinning the pts/bender-1.0.2, Pabellon Barcelona, CPU-Only test to a single 22-core POWER9 chip (sudo ppc64_cpu --cores-on=22) and two POWER9 chips (sudo ppc64_cpu --cores-on=44) show a huge speedup:

Benchmark Duration (deviation over 3 runs) Speedup Baseline (GPU blend file) 1509.97s (0.30%) n/a Single 22-core POWER9 chip (CPU blend file) 458.64s (0.19%) 3.29x Two 22-core POWER9 chips (CPU blend file) 241.33s (0.25%) 6.25x tl;dr

Some of the benchmarks where we don't perform as well as Intel are where the benchmark has inline assembly for x86 but uses generic C compiler generated assembly for POWER9. We could probably benefit with some more powerpc optimsed functions.

We also found a couple of things that should result in better performance for all three architectures, not just POWER.

A summary of the performance improvements we found:

Benchmark Approximate Improvement Parboil 3x x264 1.1x Primesieve 1.1x LAME 5x FLAC 3x OpenSSL 2x SciKit-Learn 7-15x Blender 3x

There is obviously room for more improvements, especially with the Primesieve and x264 benchmarks, but it would be interesting to see a re-run of the Phoronix benchmarks with these changes.

Thanks to Anton, Daniel, Joel and Nick for the analysis of the above benchmarks.

Michael Still: city2surf 2018 wrap up

Mon, 2018-08-13 16:00

city2surf 2018 was yesterday, so how did the race go? First off, thanks to everyone who helped out with my fund raising for the Black Dog Institute — you raised nearly $2,000 AUD for this important charity, which is very impressive. Thanks for everyone’s support!

city2surf is 14kms, with 166 meters of vertical elevation gain. For the second year running I was in the green start group, which is for people who have previously finished the event in less than 90 minutes. There is one start group before this, red, which is for people who can finish in less than 70 minutes. In reality I think its unlikely that I’ll ever make it to red — it would require me to shave about 30 seconds per kilometre off my time to just scrape in, and I think that would be hard to do.

Training for city2surf last year I tore my right achilles, so I was pretty much starting from scratch for this years event — at the start of the year I could run about 50 meters before I had issues. Luckily I was referred to an excellent physiotherapist who has helped me build back up safely — I highly recommend Cameron at Southside Physio Therapy if you live in Canberra.

Overall I ran a lot in training for this year — a total of 540 kilometres. I was also a lot more consistent than in previous years, which is something I’m pretty proud of given how cold winters are in Canberra. Cold weather, short days, and getting sick seem to always get in the way of winter training for me.

On the day I was worried about being cold while running, but that wasn’t an issue. It was about 10 degrees when we started and maybe a couple of degrees warmer than that at the end. The maximum for the day was only 16, which is cold for Sydney at this time of year. There was a tiny bit of spitting rain, but nothing serious. Wind was the real issue — it was very windy at the finish, and I think if it had been like that for the entire race it would have been much less fun.

That said, I finished in 76:32, which is about three minutes faster than last year and a personal best. Overall, an excellent experience and I’ll be back again.

Matthew Oliver: Keystone Federated Swift – Final post coming

Fri, 2018-08-10 12:05

This is a quick post to say the final topology post is coming. It’s currently in draft from and I hope to post it soon. I just realised it’s been a while so thought I’d better give an update.

 

The last post goes into what auth does, what is happening in keystone, what needs to happen  to really make this topology work and then talks about the brittle POC I created to have something to demo. I’ll be discussing other better options/alternative. But all this means it’s become much more detailed then I originally expected. I’ll hope to get it up by mid next week.

 

Thanks for waiting.

David Rowe: Testing a RTL-SDR with FSK on HF

Tue, 2018-08-07 16:04

There’s a lot of discussion about ADC resolution and SDRs. I’m trying to develop a simple HF data system that uses RTL-SDRs in “Direct Sample” mode. This blog post describes how I measured the Minimum Detectable Signal (MDS) of my 100 bit/s 2FSK receiver, and a spreadsheet model of the receiver that explains my results.

Noise in a receiver comes from all sorts of places. There are two sources of concern for this project – HF band noise and ADC quantisation noise. On lower HF frequencies (7MHz and below) I’m guess-timating signals weaker than -100dBm will be swamped by HF band noise. So I’d like a receiver that has a MDS anywhere under that. The big question is, can we build such a receiver using a low cost SDR?

Experimental Set Up

So I hooked up the experimental setup in the figure below:

The photo shows the actual hardware. It was spaced apart a bit further for the actual test:

Rpitx is sending 2FSK at 100 bit/s and about 14dBm Tx power. It then gets attenuated by some fixed and variable attenuators to beneath -100dBm. I fed the signal into a RTL-SDR plugged into my laptop, demodulated the 2FSK signal, and measured the Bit Error Rate (BER).

I tried a command line receiver:

rtl_sdr -s 1200000 -f 7000000 -D 2 - | csdr convert_u8_f | csdr shift_addition_cc `python -c "print float(7000000-7177000)/1200000"` | csdr fir_decimate_cc 25 0.05 HAMMING | csdr bandpass_fir_fft_cc 0 0.1 0.05 | csdr realpart_cf | csdr convert_f_s16 | ~/codec2-dev/build_linux/src/fsk_demod 2 48000 100 - - | ~/codec2-dev/build_linux/src/fsk_put_test_bits -

and also gqrx, using this configuration:

with the very handy UDP output option sending samples to the FSK demodulator:

$ nc -ul 7355 | ./fsk_demod 2 48000 100 - - | ./fsk_put_test_bits -

Both versions demodulate the FSK signal and print the bit error rate in real time. I really love the csdr tools, and gqrx is also great for a more visual look at the signal and the ability to monitor the audio.

For these tests the gqrx receiver worked best. It attenuated nearby interferers better (i.e. better sideband rejection) and worked at lower Rx signal levels. It also has a “hardware AGC” option that I haven’t worked out how to enable in the command line tools. However for my target use case I’ll eventually need a command line version, so I’ll have to improve the command line version some time.

The RF Gods are smiling on me today. This experimental set up actually works better than previous bench tests where we needed to put the Tx in another room to get enough isolation. I can still get 10dB steps from the attenuator at -120dBm (ish) with the Tx a few feet from the Rx. It might be the ferrites on the cable to the switched attenuator.

I tested the ability to get solid 10dB steps using a CW (continuous sine wave) signal using the “test” utility in rpitx. FSK bounces about too much, especially with the narrow spec an settings I need to measure weak signals. The configuration of the Rigol DSA815 I used to see the weak signals is described at the end of this post on the SM2000.

The switched attenuator just has 10dB steps. I am getting zero bit errors at -115dBm, and the modem fell over on the next step (-125dBm). So the MDS is somewhere in between.

Model

This spreadsheet (click for the file) models the receiver:

By poking the RTL-SDR with my signal generator, and plotting the output waveforms, I worked out that it clips at around -30dBm (a respectable S9+40dB). So that’s the strongest signal it can handle, at least using the rtl_sdr command line options I can find. Even though it’s an 8 bit ADC I figure there are 7 magnitude bits (the samples are unsigned chars). So we get 6dB per bit or 42dB dynamic range.

This lets us work out the the power of the quantisation noise (42dB beneath -30dBm). This noise power is effectively spread across the entire bandwidth of the ADC, a little bit of noise power for each Hz of bandwidth. The bandwidth is set by the sample rate of the RTL-SDRs internal ADC (28.8 MHz). So now we can work out No (N-nought), the power/unit Hz of bandwidth. It’s like a normalised version of the receiver “noise floor”. An ADC with more bits would have less quantisation noise.

There follows some modem working which gives us an estimate of the MDS for the modem. The MDS of -117.6dBm is between my two measurements above, so we have a good agreement between this model and the experimental results. Cool!

Falling through the Noise Floor

The “noise floor” depends on what you are trying to receive. If you are listening to wide 10kHz wide AM signal, you will be slurping up 10kHz of noise, and get a noise power of:

-146.6+10*log10(10000) = -106.6 dBm

So if you want that AM signal to have a SNR of 20dB, you need a received signal level of -86.6dB to get over the quantisation noise of the receiver.

I’m trying to receive low bit rate FSK which can handle a lot more noise before it falls over, as indicated in the spreadsheet above. So it’s more robust to the quantisation noise and we can have a much lower MDS.

The “noise floor” is not some impenetrable barrier. It’s just a convention, and needs to be considered relative to the bandwidth and type of the signal you want to receive.

One area I get confused about is noise bandwidth. In the model above I assume the noise band width is the same as the ADC sample rate. Please feel free to correct me if that assumption is wrong! With IQ signals we have to consider negative frequencies, complex to real conversions, which all affects noise power. I muddle through this when I play with modems but if anyone has a straight forward explanation of the noise bandwidth I’d love to hear it!

Blocking Tests

At the suggestion of Mark, I repeated the MDS tests with a strong CW interferer from my signal generator. I adjusted the Sig Gen and Rx levels until I could just detect the FSK signal. Here are the results, all in dBm:

Sig Gen 2FSK Rx MDS Difference -51 -116 65 -30 -96 66

The FSK signal was at 7.177MHz. I tried the interferer at 7MHz (177 kHz away) and 7.170MHz (just 7 kHz away) with the same blocking results. I’m pretty impressed that the system can continue to work with a 65dB stronger signal just 7kHz away.

So the interferer desensitises the receiver somewhat. When listening to the signal on gqrx, I can hear the FSK signal get much weaker when I switch the Sig Gen on. However it often keeps demodulating just fine – FSK is not sensitive to amplitude.

I can also hear spurious tones appearing; the quantisation noise isn’t really white noise any more when a strong signal is present. Makes me feel like I still have a lot to learn about this SDR caper, especially direct sampling receivers!

As with the MDS results – my blocking results are likely to depend on the nature of the signal I am trying to receive. For example a SSB signal or higher data rate might have different blocking results.

Still, 65dB rejection on a $27 radio (at least for my test modem signal) is not too shabby. I can get away with a S9+40dB (-30dBm) interferer just 7kHz away with my rx signal near the limits of where I want to detect (-96dBm).

Conclusions

So I figure for the lower HF bands this receivers performance is OK – the ADC quantisation noise isn’t likely to impact performance and the strong signal performance is good enough. An overload of -30dBm (S9+40dB) is also acceptable given the use case is remote communications where there is unlikely to be any nearby transmitters in the input filter passband.

The 100 bit/s signal is just a starting point. I can use that as a reference to help me understand how different modems and bit rates will perform. For example I can increase the bit rate to say 1000 bit/s 2FSK, increasing the MDS by 10dB, and still be well beneath my -100dBm MDS target. Good.

If it does falls over in the real world due to MDS performance, overload or blocking, I now have a good understanding of how it works so it will be possible to engineer a solution.

For example a pre-amp with X dB gain would lower the quantisation noise power by X dB and allow us to detect weaker signals but then the Rx would overload at -30-X dB. If we have strong signal problems but our target signal is also pretty strong we can insert an attenuator. If we drop in another SDR I can recompute the quantisation noise from it’s specs, and estimate how well it will perform.

Reading Further

Rpitx and 2FSK, first part in this series.
Spreadsheet used to do the working for the quantisation noise.