Planet Linux Australia

Syndicate content
Planet Linux Australia - http://planet.linux.org.au
Updated: 4 min 58 sec ago

sthbrx - a POWER technical blog: Interning at Ozlabs

Wed, 2016-06-08 22:22

I am sadly coming to the end of my six(ish) month internship with Ozlabs (funded by ACS). So here I am writing about my experience in the hopes that future prospective interns can read about how they should come and work with the previously dubbed Linux Gods.

What is your background?

Despite embracing being a nerd at school, my opinion of computers prior to starting my Engineering degree was that they were boring and for geeky boys who didn't want to interact with the 'real' world. However when having to choose a specialisation of Engineering I was drawn towards Computer Systems as everything else seemed obvious * but Computer Systems was this great mystical unknown.

Fast forward three years, and I had seen glimpses into the workings of this magical computer world. I had learnt about transistors, logic gates and opamps; I had designed circuits that actually worked; and I had bashed my head against a wall trying to find obscure bugs. I had dabbled in a range of languages from the low levels of VHDL and embedded C, to the abstract world of Python and Java and delved into the obscure world of declarative prologs and relational reinforcement learning. Now it was time to solidify some of these concepts and get some experience under my belt so I could feel less like a monkey bashing random keys on my keyboard. Enter Ozlabs!

What did you do at Ozlabs?

After being handed a nice laptop and the root passwords, I faced the inevitable battle of getting everything setup. With the help of my mentor, the prestigious Michael Ellerman, and various other Ozlabs residents I picked off some low hanging fruit such as removing unused code and tidying up a few things. This allowed me to get familiar with the open-source workflow, the kernel building process, IRC, do more with Git then just push and pull, and finally come face-to-face with the seemingly impossible: Vim and virtual machines.

I then got to learn about Transactional Memory (TM) - a way of making a bunch of instructions on one processor appear to be one atomic operation to other processors. I took some old TM tests from Mikey and checked that they did indeed pass and fail when they were supposed to and refurbished them a little, learning how to run kernel self-tests and a bit about powerpc assembly along the way.

Eventually my fear of shell scripts was no match for my desire to be able to build and install a kernel with one command and so I finally got around to writing a build script. Accidentally rebooting a bare-metal machine instead of my VM running on it may have had a significant contribution to this...

The next interesting task I got to tackle was to implement a virtual memory dump that other architectures like x86 have, so we can see how the pages in memory are laid out along with information about these pages. This involved understanding x86's implementation and relating that to POWER's memory management. At Uni I never quite understood the fuss about pages and virtual memory and so it was great to be able to build up an appreciation and play around with page tables, virtual to real addresses, and hashtable.

I then moved onto SROP mitigation! After a lot of reading and re-reading, I decided to first understand how to use SROP to make an exploit on POWER which meant some assembly, diving into the signal code and finally meeting and spending time with GDB. Once again I had x86 code to port over to POWER, the main issue being making sure that I didn't break existing things - aka hours and hours of running the kernel self-tests and the Linux Test Project tests and some more scripting, with the help of Chris Smart, to collate the results.

You can judge all my submitted patches here.

What was your overall experience like at Ozlabs?

I moved to Canberra shortly after finishing exams and so hadn't had the time to ponder expectations of Ozlabs. Everyone was super friendly and despite being, not just the only female but, the only kiwi among a whoooole lot of Aussies I experienced a distinct lack of discrimination (apart from a bit of banter about accents).

Could I wear my normal clothes (and not stuffy business clothes)? Check. Did I get to work on interesting things? Check. Could I do my work without having to go through lots of unnecessary hoops and what not? Check. Could I develop my own workflow and learn all the things? Check. Did I get to delve into a few different areas? Check. Was I surrounded by super smart people who were willing to help me learn? Check.

All in all, I have had a great time here, learnt so much and you should definitely come and work at Ozlabs! Hopefully you'll see me back on this blog in a few months :)

* My pre-university, perhaps somewhat naiive, opinion: Civil and Mechanical is just physics. Chemical and Materials is just chemistry. Electrical seems interesting but who wants to work with power lines? Biomedical is just math and biology. Software is just abstract high level nonsense. But how a computer works?? That is some magical stuff.

Russell Coker: Sysadmin Skills and University Degrees

Wed, 2016-06-08 17:02

I think that a major deficiency in Computer Science degrees is the lack of sysadmin training.

Version Control

The first thing that needs to be added is the basics of version control. CVS (which is now regarded as obsolete) was initially released when I was in the first year of university. But SCCS and RCS had been in use for some time. I think that the people who designed my course were remiss in not adding any mention of version control (not even strategies for saving old versions of your work), one could say that they taught us about version control by letting us accidentally delete our assignments. :-#

If a course is aimed at just teaching programmers (as most CS degrees are) then version control for group assignments should be a standard part of the course. Having some marks allocated for the quality of comments in the commit log would also be good.

A modern CS degree should cover distributed version control, that means covering Git as it’s the most popular distributed version control system nowadays.

For people who want to work as sysadmins (as opposed to developers who run their own PCs) a course should have an optional subject for version control of an entire system. That includes tools like etckeeper for version control of system configuration and tools like Puppet for automated configuration and system maintenance.

Dependencies

It’s quite reasonable for a CS degree to provide simplified problems for the students to solve so they can concentrate on one task. But in the real world the problems are more complex. One of the more difficult parts of managing real systems is dependencies. You have issues of header files etc at compile time and library versions at deployment. Often you need a program to run on systems with different versions of the OS which means making it compile for both and deal with differences in behaviour.

There are lots of hacky things that people do to deal with dependencies in systems. People link compiled programs statically, install custom versions of interpreters in user home directories or /usr/local for daemons, and do many other things. These things can have bad consequences including data loss, system downtime, and security problems. It’s not always wrong to do such things, but it’s something that should only be done with knowledge of the potential consequences and a plan for mitigating them. A CS degree should teach the potential advantages and disadvantages of these options to allow graduates to make informed decisions.

Backups

I’ve met many people who call themselves computer professionals and think that backups aren’t needed. I’ve seen production systems that were designed in a way that backups were impossible. The lack of backups is a serious problem for the entire industry.

Some lectures about backups could be part of a version control subject in a general CS degree. For a degree that majors in Sysadmin at least one subject about backups is appropriate.

For any backup (even backing up your home PC) you should have offsite backups to deal with fire damage, multiple backups of different ages (especially important now that encryption malware is a serious threat), and a plan for how fast you can restore things.

The most common use of backups is to deal with the case of deleting the wrong file. Unfortunately this case seems to be the most rarely mentioned.

Another common situation that should be covered is a configuration error that results in a system that won’t boot correctly. It’s a very common problem and one that can be solved quickly if you are prepared but which can take a long time if you aren’t.

For a Sysadmin course it is important to cover backups of systems in remote datacenters.

Hardware

A good CS degree should cover the process of selecting suitable hardware. Programmers often get to advise on the hardware used to run their code, especially at smaller companies. Reliability features such as RAID, ECC RAM, and clustering should be covered.

Planning for upgrades is a very important part of this which is usually not taught. Not only do you need to plan for an upgrade without much downtime or cost but you also need to plan for what upgrades are possible. Next year will your system require hardware that is more powerful than you can buy next year? If so you need to plan for a cluster now.

For a Sysadmin course some training about selecting cloud providers and remote datacenter hosting should be provided. There are many complex issues that determine whether it’s most appropriate to use a cloud service, hosted virtual machines, hosted physical servers managed by the ISP, hosted physical servers purchased by the client, or on-site servers. Often a large system will involve 2 or more of those options, even some small companies use 3 or more of those options to try and provide the performance and reliability they need at a price they can afford.

We Need Sysadmin Degrees

Covering the basic coding skills takes a lot of time. I don’t think we can reasonably expect a CS degree to cover all that and also give good coverage to sysadmin work. While some basic sysadmin skills are needed by every programmer I think we need to have separate majors for people who want a career in system administration.

Sysadmins need some programming skills, but that’s mostly scripting and basic debugging. Someone who’s main job is as a sysadmin can probably expect to never make any significant change to a program that’s more than 10,000 lines long. A large amount of the programming in a CS degree can be replaced by “file a bug report” for a sysadmin degree.

This doesn’t mean that sysadmins shouldn’t be doing software development or that they aren’t good at it. One noteworthy fact is that it appears that the most common job among developers of the Debian distribution of Linux is System Administration. Developing an OS involves some of the most intensive and demanding programming. But I think that more than a few people who do such work would have skipped a couple of programming subjects in favour of sysadmin subjects if they were given a choice.

Suggestions

Did I miss anything? What other sysadmin skills should be taught in a CS degree?

Do any universities teach these things now? If so please name them in the comments, it is good to help people find universities that teach them what they want to learn and help them in their career.

Related posts:

  1. university degrees Recently someone asked me for advice on what they can...
  2. A Better University I previously wrote about the financial value of a university...
  3. The Financial Value of a University Degree I’ve read quite a few articles about the value of...

Francois Marier: Simple remote mail queue monitoring

Wed, 2016-06-08 15:34

In order to monitor some of the machines I maintain, I rely on a simple email setup using logcheck. Unfortunately that system completely breaks down if mail delivery stops.

This is the simple setup I've come up with to ensure that mail doesn't pile up on the remote machine.

Server setup

The first thing I did on the server-side is to follow Sean Whitton's advice and configure postfix so that it keeps undelivered emails for 10 days (instead of 5 days, the default):

postconf -e maximal_queue_lifetime=10d

Then I created a new user:

adduser mailq-check

with a password straight out of pwgen -s 32.

I gave ssh permission to that user:

adduser mailq-check sshuser

and then authorized my new ssh key (see next section):

sudo -u mailq-check -i mkdir ~/.ssh/ cat - > ~/.ssh/authorized_keys Laptop setup

On my laptop, the machine from where I monitor the server's mail queue, I first created a new password-less ssh key:

ssh-keygen -t ed25519 -f .ssh/egilsstadir-mailq-check cat ~/.ssh/egilsstadir-mailq-check.pub

which I then installed on the server.

Then I added this cronjob in /etc/cron.d/egilsstadir-mailq-check:

0 2 * * * francois /usr/bin/ssh -i /home/francois/.ssh/egilsstadir-mailq-check mailq-check@egilsstadir mailq | grep -v "Mail queue is empty"

and that's it. I get a (locally delivered) email whenever the mail queue on the server is non-empty.

There is a race condition built into this setup since it's possible that the server will want to send an email at 2am. However, all that does is send a spurious warning email in that case and so it's a pretty small price to pay for a dirt simple setup that's unlikely to break.

Tim Serong: It’s OK to be Wrong in Public

Wed, 2016-06-08 03:03

I’ve spent a reasonably long time with computers. I’ve been doing something with either software or hardware (mostly software) for pretty close to three quarters of my current lifespan. I started when I was about 10, but (perhaps unsurprisingly) nobody was paying me for my work yet then. Flash forwards a few decades, and I have a gig I rather enjoy with SUSE, working on storage stuff.

OK, “yay Tim”. Enough of the backstory, what’s the point?

The point (if I can ball up my years of experience, and the experience of the world at large), is that, in aggregate, we write better software if we do it in the open. There’s a whole Free Software vs. Open Source thing, and the nuances of that discussion are interesting and definitely important, but to my mind this is all rather less interesting than the mechanics of how F/OSS projects actually work in practice. In particular, given that projects are essentially communities, and communities are made up of individuals, how does an individual join an existing project, and become accepted and confident in that space?

If you’re an individual looking for something to work on, whether or not you think about it in these terms, you’re effectively looking for a community to join. You’re hopefully going to be there for a while.

But you’re one little person, and there’s a big established community that already knows how everything works. Whatever you’re proposing has probably already been thought of by someone else, and your approach is probably wrong. It’s utterly terrifying, especially when anything you push to a git repo or public mailing list will probably be online for the rest of your life.

Fuck that line of thinking. It’s logical reasoning, but it’s utterly unhelpful in terms of joining a project. It might be correct in broad strokes if you squint at it just right, but you’re bringing new eyes to something. You’ll probably see things established community members didn’t, or if not, you’ll be able to help smooth the way for the next newcomer. One of the kinks though is speaking up about $WEIRD_THING_IN_PROJECT. Is it actually broken, or do you just have no idea what’s going on yet because you’re new? Do you speak up? Do you raise a bug? Put in a pull request? Risk shame in public if you’re wrong?

I might be slightly biased. This is either because I’ve been doing this long enough that I no longer suffer too much if someone tells me I’ve made a mistake (I’ve made lots of them, and hopefully learned from all of them), or it’s because I’m a scary looking white dude, dripping with privilege. Probably it’s a mix of both, but the most important thing I think I ever learned is that it’s OK to be wrong in public. If in doubt, you should:

  • Listen for long enought to get a feel for the mailing list (or forum, or whatever).
  • Ask the question you think is stupid.
  • Submit the pull request you hope is helpful, but are actually sure is incomplete or inadequate.
  • Propose the new architecture you’re certain will be shot down.
  • Don’t take it personally if you do get shot down. This can be a crawling horror of difficulty, and only goes away with either arrogance or time (hopefully time, which I’m assured will eventually compost into wisdom).

If you don’t get a helpful answer to the stupid question, if you don’t get constructive feedback for the pull request or new architecture, if some asshole does shoot you down, this is not the project or community for you.

If someone helps you, you might have found something worth pursuing. If that pans out, keep asking stupid questions, and keep submitting pull requests you’re worried about. You’ll learn something, and so will everyone else, and the world will eventually be a better place.

sthbrx - a POWER technical blog: Using the Atom editor for Linux kernel development

Tue, 2016-06-07 17:03

Atom is a text editor. It's new, it's shiny, and it has a lot of good and bad sides. I work in a lab full of kernel developers, and in the kernel, there are no IDEs. There's no real metadata you can get out of your compiler (given the kernel isn't very clang friendly), there's certainly nothing like that you can get out of your build system, so "plain old" text editors reign supreme. It's a vim or Emacs show.

And so Atom comes along. Unlike other shiny new text editors to emerge in the past 10 or so years, it's open source (unlike Sublime Text), it works well on Linux, and it's very configurable. When it first came out, Atom was an absolute mess. There was a noticeable delay whenever you typed a key. That has gone, but the sour impression that comes from replacing a native application with a web browser in a frame remains.

Like the curious person I am, I'm always trying out new things to see if they're any good. I'm not particularly tied to any editor; I prefer modal editing, but I'm no vim wizard. I eventually settled on using Emacs with evil-mode (which I assumed would make both Emacs and vim people like me, but the opposite happened), which was decent. It was configurable, it was good, but it had issues.

So, let's have a look at how Atom stacks up for low-level work. First of all, it's X only. You wouldn't use it to change one line of a file in /etc/, and a lot of kernel developers only edit code inside a terminal emulator. Most vim people do this since gvim is a bit wonky, and Emacs people can double-dip; using Emacs without X for small things and Emacs with X for programming. You don't want to do that with Atom, if nothing else because of its slow startup time.

Now let's look at configurability. In my opinion, no editor will ever match the level of configurability of Emacs, however the barrier to entry is much lower here. Atom has lots of options exposed in a config file, and you can set them there or you can use an equivalent GUI. In addition, a perk of being a browser in a frame is that you can customise a lot of UI things with CSS, for those inclined. Overall, I'd say Emacs > Atom > vim here, but for a newbie, it's probably Atom > Emacs > vim.

Okay, package management. Atom is the clear winner here. The package repository is very easy to use, for users and developers. I wrote my own package, typed apm publish and within a minute a friend could install it. For kernel development though, you don't really need to install anything, Atom is pretty batteries-included. This includes good syntax highlighting, ctags support, and a few themes. In this respect, Atom feels like an editor that was created this century.

What about actually editing text? Well, I only use modal editing, and Atom is very far from being the best vim. I think evil-mode in Emacs is the best vim, followed closely by vim itself. Atom has a vim-mode, and it's fine for insert/normal/visual mode, but anything involving a : is a no-go. There's a plugin that's entirely useless. If I tried to do a replacement with :s, Atom would lock up and fail to replace the text. vim replaced thousands of occurrences with in a second. Other than that, Atom's pretty good. I can move around pretty much just as well as I could in vim or Emacs, but not quite. Also, it support ligatures! The first kernel-usable editor that does.

Autocompletions feel very good in Atom. It completes within a local scope automatically, without any knowledge of the type of file you're working on. As far as intelligence goes, Atom's support for tags outside of ctags is very lacking, and ctags is stupid. Go-to definition sometimes works, but it lags when dealing with something as big as the Linux kernel. Return-from definition is very good, though. Another downside is that it can complete from any open buffer, which is a huge problem if you're writing Rust in one tab and C in the other.

An experience I've had with Atom that I haven't had with other editors is actually writing a plugin. It was really easy, mostly because I stole a lot of it from an existing plugin, but it was easy. I wrote a syntax highlighting package for POWER assembly, which was much more fighting with regular expressions than it was fighting with anything in Atom. Once I had it working, it was very easy to publish; just push to GitHub and run a command.

Sometimes, Atom can get too clever for its own good. For some completely insane reason, it automatically "fixes" whitespace in every file you open, leading to a huge amount of git changes you didn't intend. That's easy to disable, but I don't want my editor doing that, it'd be much better if it highlighted whitespace it didn't like by default, like you can get vim and Emacs to do. For an editor designed around git, I can't comprehend that decision.

Speaking of git, the editor pretty much has everything you'd expect for an editor written at GitHub. The sidebar shows you what lines you've added, removed and modified, and the gutter shows you what branch you're on and how much you've changed all-up. There's no in-built support for doing git things inside the editor, but there's a package for it. It's pretty nice to get something "for free" that you'd have to tinker with in other editors.

Overall, Atom has come a long way and still has a long way to go. I've been using it for a few weeks and I'll continue to use it. I'll encourage new developers to use it, but it needs to be better for experienced programmers who are used to their current workflow to consider switching. If you're in the market for a new editor, Atom might just be for you.

Clinton Roy: Software Carpentry

Mon, 2016-06-06 23:00

Today I taught my first Software Carpentry talk, specifically the Intro to Shell. By most accounts it went well.

After going through the course today I think I’ve spotted two issues that I’ll try to fix upstream.

Firstly, command substitution is a concept that is covered, and used incorrectly IMO. Command substitution is fine when you know you’re only going to get back one value, e.g. running an identify on an image to get its dimensions. But when you’re getting back an arbitrary long list of files, you’re only option is to use xargs. Using xargs also means that we can drop another concept to teach.

The other thing that Isn’t covered, but I think should be, is reverse isearch of the history buffer, it’s something that I use in my day to day use of the shell, not quite as much as tab completion, but it’s certainly up there.

A third, minor issue that I need to check, but I don’t think brace expansion was shown in the loop example. I think this should be added, as the example I ended up using showed looping over strings, numbers and file globs, which is everything you ever really end up using.

Software Carpentry uses different coloured sticky notes attached to learners laptops to indicate how they’re going. It’s really useful as a presenter out the front, if there’s a sea of green you’re good to go, if there are a few reds with helpers you’re probably OK to continue, but if there’s too many reds, it’s time to stop and fix the problem. At the end of the session we ask people to give feedback, here for posterity:

Red (bad):

  • Course really should be called Intro to Unix rather than bash
  • use of microphone might be good (difficult to hear, especially when helpers answer questions around)
  • Could have provided an intro into why  unix is advantageous over other programs
  • grep(?) got a bit complicated, could have explained more
  • start session with overview to set context eg. a graphic
  • why does unix shell suck so much, I blame you personally

Orange(not so bad):

  • maybe use the example data a bit more

Green(good):

  • patient, very knowledgeable
  • really knew his stuff
  • information generally easy to follow. good pacing overall good
  • good. referred to help files, real world this as go to for finding stuff out (mistranscribed i’m sure)
  •  good pace, good basis knowledge is taught

 

 


Filed under: Uncategorized

Linux Users of Victoria (LUV) Announce: LUV Beginners June Meeting: MPE on SIMH / Programming with Python

Sat, 2016-06-04 23:05
Start: Jun 18 2016 12:30 End: Jun 18 2016 16:30 Start: Jun 18 2016 12:30 End: Jun 18 2016 16:30 Location: 

Infoxchange, 33 Elizabeth St. Richmond

Link:  http://luv.asn.au/meetings/map

Please note change of topic

Rodney Brown will demonstrate the HP minicomputer OS "Multi-Programming Executive" running on the SIMH emulator.

Andrew Pam will lead a hands-on Python programming class using the learnpython.org website. Suitable for people with no programming skills or with existing skills in other languages. Bring your own laptop or use the desktop machines on site.

The meeting will be held at Infoxchange, 33 Elizabeth St. Richmond 3121 (enter via the garage on Jonas St.)

Late arrivals, please call (0490) 049 589 for access to the venue.

LUV would like to acknowledge Infoxchange for the venue.

Linux Users of Victoria Inc., is an incorporated association, registration number A0040056C.

June 18, 2016 - 12:30

read more

Russell Coker: I Just Ordered a Nexus 6P

Wed, 2016-06-01 17:02

Last year I wrote a long-term review of Android phones [1]. I noted that my Galaxy Note 3 only needed to last another 4 months to be the longest I’ve been happily using a phone.

Last month (just over 7 months after writing that) I fell on my Note 3 and cracked the screen. The Amourdillo case is good for protecting the phone [2] so it would have been fine if I had just dropped it. But I fell with the phone in my hand, the phone landed face down and about half my body weight ended up in the middle of the phone which apparently bent it enough to crack the screen. As a result of this the GPS seems to be less reliable than it used to be so there might be some damage to the antenna too.

I was quoted $149 to repair the screen, I could possibly have found a cheaper quote if I had shopped around but it was a good starting point for comparison. The Note 3 originally cost $550 including postage in 2014. A new Note 4 costs $550 + postage now from Shopping Square and a new Note 3 is on ebay with a buy it now price of $380 with free postage.

It seems like bad value to pay 40% of the price of a new Note 3 or 25% the price of a Note 4 to fix my old phone (which is a little worn and has some other minor issues). So I decided to spend a bit more and have a better phone and give my old phone to one of my relatives who doesn’t mind having a cracked screen.

I really like the S-Pen stylus on the Samsung Galaxy Note series of phones and tablets. I also like having a hardware home button and separate screen space reserved for the settings and back buttons. The downsides to the Note series are that they are getting really expensive nowadays and the support for new OS updates (and presumably security fixes) is lacking. So when Kogan offered a good price on a Nexus 6P [3] with 64G of storage I ordered one. I’m going to give the Note 3 to my father, he wants a phone with a bigger screen and a stylus and isn’t worried about cracks in the screen.

I previously wrote about Android device service life [4]. My main conclusion in that post was that storage space is a major factor limiting service life. I hope that 64G in the Nexus 6P will solve that problem, giving me 3 years of use and making it useful to my relatives afterwards. Currently I have 32G of storage of which about 8G is used by my music video collection and about 3G is free, so 64G should last me for a long time. Having only 3G of RAM might be a problem, but I’m thinking of trying CyanogenMod again so maybe with root access I can reduce the amount of RAM use.

Related posts:

  1. Samsung Galaxy Note 3 In June last year I bought a Samsung Galaxy Note...
  2. Nexus 4 My wife has had a LG Nexus 4 for about...
  3. The Nexus 5 The Nexus 5 is the latest Android phone to be...

Colin Charles: Speaking in June 2016

Tue, 2016-05-31 19:02

I have a few upcoming speaking engagements in June 2016:

  • Nerdear.la – June 9-10 2016 – Buenos Aires, Argentina – never been to this event but MariaDB Corporation are sponsors and I’m quite excited to be back in Buenos Aires. I’m going to talk about the MySQL ecosystem in 2016.
  • SouthEast LinuxFest – June 10-12 2016 – Charlotte, NC, USA – I have a few talks here, a bit bummed that I’m going to be missing the speaker dinner, but I expect this to be another great year. Learn about MariaDB Server/MySQL Security Essentials, the MySQL ecosystem in 2016, and about distributions from the view of a package.
  • NYC MySQL Meetup – June 27 2016 – New York, USA – I’m going to give a talk on lessons you can learn from other people’s database failures. I did this at rootconf.in, and it was well received so I’m quite excited to give this again.
  • Community Open House for MongoDB – June 30 2016 – New York, USA – I’m going to give my first MongoDB talk at the Community Open House for MongoDB – My First Moments with MongoDB, from the view of someone who’s been using MySQL for a very long time.

So if you’re in Buenos Aires, Charlotte or New York, I’m looking forward to seeing you to talk all things databases and open source.

Binh Nguyen: Is Western Leadership Required? More Social Systems, and More

Tue, 2016-05-31 06:42
Western Leadership stuff: From time to time you hear things about Western leadership being 'required'. Some of it sounds borderline authoritarian/dictatorial at times. I wanted to take a look further at this: - examples of such quotes include "Instead, America should write the rules. America should call the shots. Other countries should play by the rules that America and our partners set, and

Maxim Zakharov: plsh2

Tue, 2016-05-31 01:04

PL/sh - is a nice extension to PostgreSQL allowing to write stored procedures in an interpreted language, e.g. bash, python, perl, php, etc.

I found it useful though having a major drawback that the amount of data you can pass via arguments of such procedures may hit command line limitations, i.e. no more 254 spaces and no more 2MB (or even less).

So I have made a change that the value of the first argument is passed via stdin to the script implementing the stored procedure, the rest of arguments is passed as $1, $2, $3, etc. This change is allow to overcome above mentioned limitations in case when big amount of data is passed via one parameter.

Here is a tiny example I have added to the test suite with new functionality:

CREATE FUNCTION perl_concat2(text, text) RETURNS text LANGUAGE plsh2 AS ' #!/usr/bin/perl print while (<STDIN>); print $ARGV[0]; '; SELECT perl_concat2('pe', 'rl');

You may get modified PL/sh in my repository on GitHub: github.com/Maxime2/plsh. It has been implemented as a new procedural language plsh2, so you do not need to change anything in already created procedures/functions using plsh (and you can continue use it as before).

Chris Smart: How to find out which process is listening on a port

Mon, 2016-05-30 17:03

Say that you notice UDP port 323 is open (perhaps via netstat -lun) and you’ve no idea what that is!

With lsof it’s easy to find out which process is guilty:


[15:27 chris ~]$ sudo lsof -i :323
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
chronyd 1044 chrony 1u IPv4 19197 0t0 UDP localhost:323
chronyd 1044 chrony 2u IPv6 19198 0t0 UDP localhost:323

In this case, it’s chrony, the modern time keeping daemon.

As Jonh pointed out in the comments, you can also use netstat with the -p flag.

For example, show all processes listening (-l) on both TCP (-t) and UDP (-u) by port number (-n) showing the process (-p), while I grep for port 323 to show what’s running:

[19:08 chris ~]$ sudo netstat -lutnp |grep 323
udp 0 0 127.0.0.1:323 0.0.0.0:* 1030/chronyd
udp6 0 0 ::1:323 :::* 1030/chronyd

David Rowe: Codec 2 Masking Model Part 5

Mon, 2016-05-30 15:04

In the last post in this series I was getting close to a fully quantised 700 bit/s codec. However as I pushed through I discovered a bug in the post-filter. I was accidentally cheating and using some of the encoder information in the decoder. When I corrected the bug the quality dropped significantly. I’ve hit these sorts of bugs before – the simulation code is complex and it’s easy to “declare victory” prematurely.

So I have abandoned the AbyS approach for now. Oh well, that’s “research and disappointment” for you. Plenty of new ideas though….

For the last few months I have been working on another solution that vector quantises a “fixed rate” version of the spectrum. The masking functions are still used to smooth the spectrum before sampling at the fixed rate. Much like we low pass filter time domain samples before sampling, the masking functions reduce the “bandwidth” and hence sample “rate” we need to represent the spectrum. Here is a block diagram of the current “700C” candidate codec:

The bit allocation is pitch (Wo) 6 bits, 1 bit for voicing, 16 bits for the amplitude VQ, 4 bits for energy and 1 bit spare. All updated every 40ms. The new work is in the “Decimate in Frequency” block, expanded here:

As the pitch of the speech varies, the number of harmonics used to represent the speech, L, varies. The goal is take a vector of L amplitude samples, vector quantise, and send them over a channel. To vector quantise them we need fixed length vectors. So a Discrete Fourier Transform (DFT) is used to resample the L amplitude samples to fixed vectors of length 20 (I have chosen k=10).

BTW a DFT is the generic form of a Fast Fourier Transform (FFT). A FFT is a computationally efficient (fast) way of computing a DFT.

The steps are similar to sampling a time domain signal. The bandwidth of the signal is limited by using the masking function to smooth the variations in the amplitude envelope. The use of masking functions means the smoothing matches the response of the ear, and no perceptually important information is lost.

I’ve recently been playing with OFDM modems, so I used a “cyclic suffix” to further smooth the DFT coefficients. DFTs like cyclic signals. If you have a DFT of an 8kHz signal, the sample at 3900Hz is the “close” to the sample at 0 Hz. If there is a step jump in amplitude – you get a lot of high frequency information in the DFT coefficients which is harder to quantise. So I throw away the last 500Hz of the speech signal (3500-4000 Hz), and replace it with a curve that ensures a smooth match between 3500 Hz and 0 Hz.

Yeah, I don’t know how I dream this stuff up either …… do I use the Force? Too much red wine or espresso? Experience? A life mispent on computers? Subconscious innovation? Plagiarism?

In the past I’ve tried to resample and VQ the spectrum of sinusoidal codecs a few times, without much success. Jean Marc also suggested something similar a few posts back. Anyhoo, getting somewhere this time around.

Here are some plots that show the algorithm in action for a frame of female speech:

Here are the amplitude samples (red crosses). The blue line has the cyclic suffix, note how it meets the first amplitude sample near 0Hz.

This figure shows the difference in the DFT coefficients with (blue) and without (green) the cyclic suffix:

Here is the cumulative energy of DFT coefficients, note that with the cyclic suffix (blue) low frequency energy dominates:

This figure shows a typical 2k=20 length vector that we vector quantise. Note it has zero mean – we extract the DC coefficient and separately quantise this as the frame energy.

Samples

Sample 1300 700C Candidate hts1a Listen Listen hts2a Listen Listen forig Listen Listen ve9qrp_10s Listen Listen mmt1 Listen Listen vkqi Listen Listen cq_ref Listen Listen

Through a couple of years of on-air operation we have established that the 1300 bit/s codec (as used in FreeDV 1600 with 300 bit/s of FEC) has acceptable speech quality for HF. So the goal of this work is similar quality at 700 bit/s.

For some samples above (e.g. hts1a and mmt1a), 1300 is superior to the current 700C candidate. For others (e.g. hts2a and vk5qi) 700 sounds a little better. So I think I’m in the ball park.

There’s a bit of clipping at the start of cq_ref, and some level variations between the two modes on some samples. The 700C candidate has a few problems with unvoiced sounds, e.g. the intake of breath on ve9qrp_10, and the “ch” sound at the start of chicken in hts2a. Not sure why.

The cq_ref_1300 sample is a bit poor as the LPC technique used for spectral amplitudes falls over when the spectral dynamic range is high. In this sample the LF energy has much higher energy than the HF, i.e. a strong “Low Pass Filter” effect or spectral slope.

Next step is some refactoring – the Octave code is an untidy mess of 6 months of dead ends and false starts. A mirror of real world R&D I guess. Creating something new is not a tidy process. At least in my head. So many aspects of this algorithm that I could explore but I’d rather get this on the air and see if we really have something here. Would love to have some help with a port from Octave to C. Contact me if you’d like to work in this area.

Michael Still: Oryx and Crake

Fri, 2016-05-27 21:00



ISBN: 9780385721677
LibraryThing
I bought this book ages ago, on the recommendation of a friend (I don't remember who), but I only just got around to reading it. Its a hard book to read in places -- its not hopeful, or particularly fun, and its confronting in places -- especially the plot that revolves around child exploitation. There's very little to like about the future society that Atwood posits here, but perhaps that's the point.

Despite not being a happy fun story, the book made me think about things like genetic engineering in a way I didn't before and I think that's what Atwood was seeking to achieve. So I'd have to describe the book as a success.

Tags for this post: book margaret_atwood apocalypse genetic_engineering
Related posts: The Exterminator's Want Ad; Cyteen: The Vindication; East of the Sun, West of the Moon; The White Dragon; Runner; Cyteen: The Betrayal Comment Recommend a book

Francois Marier: Using OpenVPN on iOS and OSX

Fri, 2016-05-27 16:03

I have written instructions on how to connect to your own OpenVPN server using Network Manager as well as Android.

Here is how to do it on iOS and OSX assuming you have followed my instructions for the server setup.

Generate new keys

From the easy-rsa directory you created while generating the server keys, create a new keypair for your phone:

./build-key iphone # "iphone" as Name, no password

and for your laptop:

./build-key osx # "osx" as Name, no password Using OpenVPN Connect on iOS

The app you need to install from the App Store is OpenVPN Connect.

Once it's installed, connect your phone to your computer and transfer the following files using iTunes:

  • ca.crt
  • iphone.crt
  • iphone.key
  • iphone.ovpn
  • ta.key

You should then be able to select it after launching the app. See the official FAQ if you run into any problems.

iphone.ovpn is a configuration file that you need to supply since the OpenVPN Connect app doesn't have a configuration interface. You can use this script to generate it or write it from scratch using this template.

On Linux, you can also create a configuration file using Network Manager 1.2, use the following command:

nmcli connection export hafnarfjordur > iphone.ovpn

though that didn't quite work in my experience.

Here is the config I successfully used to connect to my server:

client remote hafnarfjordur.fmarier.org 1194 ca ca.crt cert iphone.crt key iphone.key cipher AES-256-CBC auth SHA384 comp-lzo yes proto udp tls-remote server remote-cert-tls server ns-cert-type server tls-auth ta.key 1 Using Viscosity on Mac OSX

One of the possible OpenVPN clients you can use on OSX is Viscosity.

Here are the settings you'll need to change when setting up a new VPN connection:

  • General
    • Remote server: hafnarfjordur.fmarier.org
  • Authentication
    • Type: SSL/TLS client
    • CA: ca.crt
    • Cert: osx.crt
    • Key: osx.key
    • Tls-Auth: ta.key
    • direction: 1
  • Options
    • peer certificate: require server nsCertType
    • compression: turn LZO on
  • Networking
    • send all traffic on VPN
  • Advanced
    • add the following extra OpenVPN configuration commands:

      cipher AES-256-CBC auth SHA384

Matthew Oliver: Adventures in Container Sharding – SQLite performance problem and the pivot point.

Fri, 2016-05-27 15:03

Hey world it’s been a while, turns out I’m not much of a blogger. But I know how useful for myself it is to do write-ups occasionally so I can actually find them later.. having said that. In my last post I mentioned I was an OpenStack Developer.. and it’s still true. I spend my time hacking and working on Openstack Swift the awesome OpenSource object storage cluster.

One thing I’ve been trying to tackle recently is container sharding in Swift, I will not go into full details as there is a Swift Spec that is relatively recent, and I’ve also gave a high level talk on it at LCA in Geelong.

The tl;dr being, Swift accounts and containers (or the metadata layer of Swift) are SQLite databases that get treated like objects themselves and replicated throughout the cluster. Which works amazingly well. Until you add millions and millions of objects to a container. And what I’m talking about here is container level object metadata, not the objects themselves. When this happens, SQLite being a file starts to have latency and locking issues, as one would expect. The solution to this is shard up these container databases throughout the cluster, which is what I’ve been working on.

At the last OpenStack summit in Austin, the awesome people at SwiftStack, whom I work quite closely with in the community gave me a container database they generated that has 700,000,000 objects in it (metadata again). This SQLite file is about 105G so not small. Plugging this into a small cluster I have to test my sharding implementation has been interesting to say the least.

When sharding a container down, we have a simple idea, split it in half. That is to say find someplace in the object table to pivot on. We can then keep pivoting giving us a list of ranges (which can be treated as a binary tree). The problem is finding the pivot point. In all my testing up til now I had what I thought was the perfect and simple way:

SELECT name
FROM object
WHERE deleted=0 ORDER BY name LIMIT 1 OFFSET (
SELECT object_count / 2
FROM policy_stat);

This did amazingly well in all my tests.. but I obviously never got big enough. This simple SQL statement would do plenty well if sharding in Swift was turned on from day dot, but the sharding plans for Swift is for once it’s solved in this POC to add to Swift as a beta which can be turned ‘on’ at a container by container basis when you want. After it graduates from beta but is still a switch. To finally once we are confident in it’s ability to have it on permanently. In the latter case container’s would never get big enough to worry about.. However, in the earlier stages a user would only turn it on when the container is _very_ slow.

Using the pivot SQL statement on the large container I was given ground to a halt, I’m sure it would have come back to be eventually, but I got tired of waiting after what seemed for ages.. there has to be a better way.

Turns out the OFFSET statement in SQLite, even when hitting an index still does a scan to find the offset.. This is slow when you get to a very large table size. Turns out under the hood, an Index is stored as a double-linked list, which I’m sure probably has optimisations but anyway I was struggling to think of a good way to find a good enough middle value that didn’t involve some table scanning. You can see from the SQL statement, we know have many objects we have in the container, but the problem is because swift is eventually consistent we need to temporally store objects that have been deleted. So randomly picking an index doesn’t help, and it wont necessarily be in name order.

So on really large containers OFFSET needs to be thrown out the window. Turns out the sharding implementation can deal with shrinking the number of shards, merging smaller ranges together, not just growing/splitting. This means we don’t actually need to be exact, we also don’t actually need to split on an existing object, just a name that would be somewhere in the middle and so long as it’s cutting down the large container then it’s good enough. So what can we do?

Turns out there is an optimisation in SQLite, because an index is a double-linked list and ordered by it’s index, it’s really quick if all we want to do is go to the first or last element. So that’s what I’ve done:

SELECT min(name) as name FROM object WHERE deleted = 0;
SELECT max(name) as name FROM object WHERE deleted = 0;

These two statements are blindingly fast due to the fact that we already have a compound index on name and delete (for cleaning up). Note however they have to be run as 2 separate commands, combine the two into one and you loose your optimisation and you’ll have to scan all elements. Having the min and max name is a good start, and even when dealing with already sharded containers, they are just smaller ranges so this still works. The question is now what?

In the perfect work we have an even distribution of objects between the min and max names, so we just need to find a middle name between the two to pivot on. Turns out even in a not evenly distributed container we will still be shrinking the container, even at worst case only be a few objects. But these will be cleaned up later (merged into a neighbour range by the implementation). And so long as the container gets smaller, eventually it’ll shrink small enough to be usable.

Next step is finding the middle value, to do this I just wrote some python:

from itertools import izip_longest import sys lower = unicode(sys.argv[1]) upper = unicode(sys.argv[2]) def middle_str(str1, str2): result = [] for l, u in izip_longest(map(ord, str1), map(ord, str2), fillvalue=0): result.append((l + u) // 2) return u''.join(map(unichr, result)) if __name__ == "__main__": print(middle_str(lower, upper))

What does it do. Calling middle_str(min, max) will grab the unicode versions of the strings, turn them into there interger values, find the middle and turn them back into a word. After matching the prefix that is. So:

$ python middle_str.py 'aaaaaaaaa' 'zzzzzzzzz'
mmmmmmmmm

$ python middle_str.py 'aaaaaaaaa' 'aazzzzzzz'
aammmmmmm

$ python middle_str.py 'DFasjiojsaoi' 'ZZsdkmfi084f'
OPjkjkjiQLQg

I am now plugging this into my implementation and lets tackle this large container again.

Binh Nguyen: More Social and Economic Systems, Music Stuff, and More

Fri, 2016-05-27 00:07
- modern economics just gibberish. St Louis Federal Reserve Database. Unemployment data distorted. One hour at soup kitchen (even volunteering) is 'employed'? International rule not Western centric. This rule (and others) developed in last 20 years. US still in sustained slump? Shadow stats. Neo-liberalism assumes that private sector will allocate more effectively? Using margin to control and

Joshua Hesketh: Third party testing with Turbo-Hipster

Mon, 2016-05-23 13:05

Why is this hipster voting on my code?!

Soon you are going to see a new robot barista leaving comments on Nova code reviews. He is obsessed with espresso, that band you haven’t heard of yet, and easing the life of OpenStack operators.

Doing a large OpenStack deployment has always been hard when it came to database migrations. Running a migration requires downtime, and when you have giant datasets that downtime could be hours. To help catch these issues Turbo-Hipster (http://josh.people.rcbops.com/2013/09/building-a-zuul-worker/) will now run your patchset’s migrations against copies of real databases. This will give you valuable feedback on the success of the patch, and how long it might take to migrate.

Depending on the results, Turbo-Hipster will add a review to your patchset that looks something like this:

What should I do if Turbo-Hipster fails?

That depends on why it has failed. Here are some scenarios and steps you can take for different errors:

FAILURE – Did not find the end of a migration after a start

  • If you look at the log you should find that a migration began but never finished. Hopefully there’ll be a traceroute for you to follow through to get some hints about why it failed.

WARNING – Migration %s took too long

  • In this case your migration took a long time to run against one of our test datasets. You should reconsider what operations your migration is performing and see if there are any optimisations you can make, or if each step is really necessary. If there is no way to speed up your migration you can email us at rcbau@rcbops.com for an exception.

FAILURE – Final schema version does not match expectation

  • Somewhere along the line the migrations stopped and did not reach the expected version. The datasets start at previous releases and have to upgrade all the way through. If you see this inspect the log for traceroutes or other hints about the failure.

FAILURE – Could not setup seed database. FAILURE – Could not find seed database.

  • These two are internal errors. If you see either of these, contact us at rcbau@rcbops.com to let us know so we can fix and rerun the tests for you.

FAILURE – Could not import required module.

  • This error probably shouldn’t happen as Jenkins should catch it in the unit tests before Turbo-Hipster launches. If you see this, please contact us at rcbau@rcbops.com and let us know.

If you receive an error that you think is a false positive, leave a comment on the review with the sole contents of recheck migrations.

If you see any false positives or have any questions or problems please contact us on rcbau@rcbops.com

Joshua Hesketh: git.openstack.org adventures

Mon, 2016-05-23 13:05

Over the past few months I started to notice occasional issues when cloning repositories (particularly nova) from git.openstack.org.

It would fail with something like

git clone -vvv git://git.openstack.org/openstack/nova . fatal: The remote end hung up unexpectedly fatal: early EOF fatal: index-pack failed

The problem would occur sporadically during our 3rd party CI runs causing them to fail. Initially these went somewhat ignored as rechecks on the jobs would succeed and the world would be shiny again. However, as they became more prominent the issue needed to be addressed.

When a patch merges in gerrit it is replicated out to 5 different cgit backends (git0[1-5].openstack.org). These are then balanced by two HAProxy frontends which are on a simple DNS round-robin.

+-------------------+ | git.openstack.org | | (DNS Lookup) | +--+-------------+--+ | | +--------+ +--------+ | A records | +-------------------v----+ +-----v------------------+ | git-fe01.openstack.org | | git-fe02.openstack.org | | (HAProxy frontend) | | (HAProxy frontend) | +-----------+------------+ +------------+-----------+ | | +-----+ +---+ | | +-----v------------------------------------v-----+ | +---------------------+ (source algorithm) | | | git01.openstack.org | | | | +---------------------+ | | +---| git02.openstack.org | | | | +---------------------+ | | +---| git03.openstack.org | | | | +---------------------+ | | +---| git04.openstack.org | | | | +---------------------+ | | +---| git05.openstack.org | | | | (HAProxy backend) | | | +---------------------+ | +------------------------------------------------+

Reproducing the problem was difficult. At first I was unable to reproduce locally, or even on an isolated turbo-hipster run. Since the problem appeared to be specific to our 3rd party tests (little evidence of it in 1st party runs) I started by adding extra debugging output to git.

We were originally cloning repositories via the git:// protocol. The debugging information was unfortunately limited and provided no useful diagnosis. Switching to https allowed for more CURL output (when using GIT_CURL_VERBVOSE=1 and GIT_TRACE=1) but this in itself just created noise. It actually took me a few days to remember that the servers are running arbitrary code anyway (a side effect of testing) and therefore cloning from the potentially insecure http protocol didn’t provide any further risk.

Over http we got a little more information, but still nothing that was conclusive at this point:

git clone -vvv http://git.openstack.org/openstack/nova . error: RPC failed; result=18, HTTP code = 200 fatal: The remote end hung up unexpectedly fatal: protocol error: bad pack header

After a bit it became more apparent that the problems would occur mostly during high (patch) traffic times. That is, when a lot of tests need to be queued. This lead me to think that either the network turbo-hipster was on was flaky when doing multiple git clones in parallel or the git servers were flaky. The lack of similar upstream failures lead me to initially think it was the former. In order to reproduce I decided to use Ansible to do multiple clones of repositories and see if that would uncover the problem. If needed I would have then extended this to orchestrating other parts of turbo-hipster in case the problem was systemic of something else.

Firstly I need to clone from a bunch of different servers at once to simulate the network failures more closely (rather than doing multiple clones on the one machine or from the one IP in containers for example). To simplify this I decided to learn some Ansible to launch a bunch of nodes on Rackspace (instead of doing it by hand).

Using the pyrax module I put together a crude playbook to launch a bunch of servers. There is likely much neater and better ways of doing this, but it suited my needs. The playbook takes care of placing appropriate sshkeys so I could continue to use them later.

--- - name: Create VMs hosts: localhost vars: ssh_known_hosts_command: "ssh-keyscan -H -T 10" ssh_known_hosts_file: "/root/.ssh/known_hosts" tasks: - name: Provision a set of instances local_action: module: rax name: "josh-testing-ansible" flavor: "4" image: "Ubuntu 12.04 LTS (Precise Pangolin) (PVHVM)" region: "DFW" count: "15" group: "raxhosts" wait: yes register: raxcreate - name: Add the instances we created (by public IP) to the group 'raxhosts' local_action: module: add_host hostname: "{{ item.name }}" ansible_ssh_host: "{{ item.rax_accessipv4 }}" ansible_ssh_pass: "{{ item.rax_adminpass }}" groupname: raxhosts with_items: raxcreate.success when: raxcreate.action == 'create' - name: Sleep to give time for the instances to start ssh #there is almost certainly a better way of doing this pause: seconds=30 - name: Scan the host key shell: "{{ ssh_known_hosts_command}} {{ item.rax_accessipv4 }} &gt;&gt; {{ ssh_known_hosts_file }}" with_items: raxcreate.success when: raxcreate.action == 'create' - name: Set up sshkeys hosts: raxhosts tasks: - name: Push root's pubkey authorized_key: user=root key="{{ lookup('file', '/root/.ssh/id_rsa.pub') }}"

From here I can use Ansible to work on those servers using the rax inventory. This allows me to address any nodes within my tenant and then log into them with the seeded sshkey.

The next step of course was to run tests. Firstly I just wanted to reproduce the issue, so in order to do that it would crudely set up an environment where it can simply clone nova multiple times.

--- - name: Prepare servers for git testing hosts: josh-testing-ansible* serial: "100%" tasks: - name: Install git apt: name=git state=present update_cache=yes - name: remove nova if it is already cloned shell: 'rm -rf nova' - name: Clone nova and monitor tcpdump hosts: josh-testing-ansible* serial: "100%" tasks: - name: Clone nova shell: "git clone http://git.openstack.org/openstack/nova"

By default Ansible runs with 5 folked processes. Meaning that Ansible would work on 5 servers at a time. We want to exercise git heavily (in the same way turbo-hipster does) so we use the –forks param to run the clone on all the servers at once. The plan was to keep launching servers until the error reared its head from the load.

To my surprise this happened with very few nodes (less than 15, but I left that as my minimum testing). To confirm I also ran the tests after launching further nodes to see it fail at 50 and 100 concurrent clones. It turned out that the more I cloned the higher the failure rate percentage was.

Now that I had the problem reproducing, it was time to do some debugging. I modified the playbook to capture tcpdump information during the clone. Initially git was cloning over IPv6 so I turned that off on the nodes to force IPv4 (just in case it was a v6 issue, but the problem did present itself on both networks). I also locked git.openstack.org to one IP rather than randomly hitting both front ends.

--- - name: Prepare servers for git testing hosts: josh-testing-ansible* serial: "100%" tasks: - name: Install git apt: name=git state=present update_cache=yes - name: remove nova if it is already cloned shell: 'rm -rf nova' - name: Clone nova and monitor tcpdump hosts: josh-testing-ansible* serial: "100%" vars: cap_file: tcpdump_{{ ansible_hostname }}_{{ ansible_date_time['epoch'] }}.cap tasks: - name: Disable ipv6 1/3 sysctl: name="net.ipv6.conf.all.disable_ipv6" value=1 sysctl_set=yes - name: Disable ipv6 2/3 sysctl: name="net.ipv6.conf.default.disable_ipv6" value=1 sysctl_set=yes - name: Disable ipv6 3/3 sysctl: name="net.ipv6.conf.lo.disable_ipv6" value=1 sysctl_set=yes - name: Restart networking service: name=networking state=restarted - name: Lock git.o.o to one host lineinfile: dest=/etc/hosts line='23.253.252.15 git.openstack.org' state=present - name: start tcpdump command: "/usr/sbin/tcpdump -i eth0 -nnvvS -w /tmp/{{ cap_file }}" async: 6000000 poll: 0 - name: Clone nova shell: "git clone http://git.openstack.org/openstack/nova" #shell: "git clone http://github.com/openstack/nova" ignore_errors: yes - name: kill tcpdump command: "/usr/bin/pkill tcpdump" - name: compress capture file command: "gzip {{ cap_file }} chdir=/tmp" - name: grab captured file fetch: src=/tmp/{{ cap_file }}.gz dest=/var/www/ flat=yes

This gave us a bunch of compressed capture files that I was then able to seek the help of my colleagues to debug (a particular thanks to Angus Lees). The results from an early run can be seen here: http://119.9.51.216/old/run1/

Gus determined that the problem was due to a RST packet coming from the source at roughly 60 seconds. This indicated it was likely we were hitting a timeout at the server or a firewall during the git-upload-pack of the clone.

The solution turned out to be rather straight forward. The git-upload-pack had simply grown too large and would timeout depending on the load on the servers. There was a timeout in apache as well as the HAProxy config for both frontend and backend responsiveness. The relative patches can be found at https://review.openstack.org/#/c/192490/ and https://review.openstack.org/#/c/192649/

While upping the timeout avoids the problem, certain projects are clearly pushing the infrastructure to its limits. As such a few changes were made by the infrastructure team (in particular James Blair) to improve git.openstack.org’s responsiveness.

Firstly git.openstack.org is now a higher performance (30GB) instance. This is a large step up from the previous (8GB) instances that were used as the frontend previously. Moving to one frontend additionally meant the HAProxy algorithm could be changed to leastconn to help balance connections better (https://review.openstack.org/#/c/193838/).

+--------------------+ | git.openstack.org | | (HAProxy frontend) | +----------+---------+ | | +------------------------v------------------------+ | +---------------------+ (leastconn algorithm) | | | git01.openstack.org | | | | +---------------------+ | | +---| git02.openstack.org | | | | +---------------------+ | | +---| git03.openstack.org | | | | +---------------------+ | | +---| git04.openstack.org | | | | +---------------------+ | | +---| git05.openstack.org | | | | (HAProxy backend) | | | +---------------------+ | +-------------------------------------------------+

All that was left was to see if things had improved. I rerun the test across 15, 30 and then 45 servers. These were all able to clone nova reliably where they had previously been failing. I then upped it to 100 servers where the cloning began to fail again.

Post-fix logs for those interested:
http://119.9.51.216/run15/
http://119.9.51.216/run30/
http://119.9.51.216/run45/
http://119.9.51.216/run100/
http://119.9.51.216/run15per100/

At this point, however, I’m basically performing a Distributed Denial of Service attack against git. As such, while the servers aren’t immune to a DDoS the problem appears to be fixed.

Joshua Hesketh: New Blog

Mon, 2016-05-23 13:05

Welcome to my new blog.

You can find my old one here: http://josh.opentechnologysolutions.com/blog/joshua-hesketh

I intend on back-porting those posts into this one in due course. For now though I’m going to start posting about my adventures in openstack!