# Quiz time

Everybody loves a quiz – so here’s one on integer maths in C. I got 10/20 fully correct (i.e. ignoring the ‘partial credits’) – which isn’t too brilliant.

NB: WordPress won’t let me tag this as ‘C’, hence ‘C++’

# How much control can Microsoft exert in the development ecosystem?

It’s a long time – over a decade – since I last used a Microsoft development tool. For what it’s worth, I quite liked Visual C++ back then, but in the middle of my subscription (in 1998 if I remember correctly) Microsoft just tore up the contract and offered me something of less use. The details escape me now but it was a formative moment – I was not willing to trust them any more and suddenly the idea of using Linux had new appeal.

(At the same time I was teaching myself how to use Unix on the Sun OS system that hosted the Labour Party’s Excalibur EFS document system: after the 1997 general election there was nobody left in the team but me and so I went out and bought the venerable Essential System Administration: Tools and Techniques for Linux and Unix Administration: Help for UNIX System Administrators – which helped but it was still hard going to escape from the Windows monoculture. The Party’s IT staff essentially said “Unix? Does anyone still use that?” – this was truly the apogee of Microsoft’s monopolistic drive.)

Today the world seems very different. But if Microsoft have taken a few knocks and people are almost as likely to think of Apple and even Google as the enemies of software diversity and freedom, we should not underestimate its raw power in the market. Most desktops in most workplaces are still Windows boxes and writing for the mass market means targeting Microsoft’s operating system.

Not surprisingly many, probably most, of the people who are doing that are using Microsoft’s tools and compilers. That gives the boys and girls in Redmond a lot of power and they don’t have to be acting in a malicious way to have an detrimental impact – as in their refusal to support the C99 language standard. The only mantra that C is a proper subset of C++ (which Microsoft fully support) has been dead for a few years now and there are features in C99, which is the standard in general use in the Unix/Linux development world not supported in C++11 (the current standard there which Microsoft are working to support). Microsoft do support the twenty year old C90 standard but have essentially said that they are not going to develop that any further – you can read more about this here.

In response to this some C++ developers have done so far as to say “C is obsolete” – perhaps reflecting a new confidence in the C++ development world, as that language has been making something of a comeback in the last couple of years (not least because Microsoft have promoted it so heavily).

That may or may not be the case – personally I doubt it very much. But since when did we allow tool manufacturers to make that decision for us?

As I did in 1998 maybe it is time for the developers to look at the alternatives. There are plenty of industrial strength compilers and editors out there that will free them from the caprice of a company which is once more demonstrating it just doesn’t get it.

# A question for a C guru

Image via Wikipedia

The debugging continues, but unfortunately I have not yet been able to identify what is fundamentally broken with my file system code.

But along the way I have spotted and fixed various bugs. Here is one, I am not even sure why it compiled in the first place, so maybe a C guru could tell me…

Was:

if le16_to_cpu(((u16 *) bh->b_data)[j * VMU_DIR_RECORD_LEN16 + VMUFAT_FIRSTBLOCK_OFFSET16] == ino)

Now:

if (le16_to_cpu(((u16 *) bh->b_data)[j * VMU_DIR_RECORD_LEN16 + VMUFAT_FIRSTBLOCK_OFFSET16]) == ino)

# Making sense of Android’s complex development process

Image via CrunchBase

Back in about 1997 I bought a book about this new programming environment – it seemed something bigger than a language but smaller than an operating system – called Java.

Back then the idea seemed great – write once, run anywhere – but there was a lot of scepticism and, of course, Microsoft tried to poison the well through the tactics of “embrace and extend” with their J++ offering. All of that made it look as though Java was going nowhere.

I wrote a couple of applets – one was a “countdown” timer for Trevor Philips‘s mayoral election website in 1999, another was a SAX based parser for the largely Perl-based content management system I wrote for the Scottish Labour Party the following year, ahead of the 2001 election. But no one seemed to like applets much – it seems ridiculous now, but the 90K download needed for the SAX parser really slowed down the Scottish party’s site, even though I was pretty proud of the little newsticker it delivered (along with annoying teletype noises as it went). I forgot about Java.

But, of course, that was wrong. Java is programming language du jour these days, though Microsoft’s responses to the success of Java and the failure of J++, C# and .net, are also big.

Android is, of course, Java’s most prominent offer these days – literally millions of people will be running Android apps even as I write this and thousands of Android phones are being bought across the world daily. Time to get reacquainted, especially as my new job is once more about political communications.

But, as I discovered with C++ when I came back to it after over a decade for my MSc, Java has moved on a fair bit in that time and, unlike C++, I cannot say all the progress seems to be positive. Indeed Java seems to thrive on a particularly ugly idiom with developers being encouraged to write constructors of anonymous classes in the headers of functions – ugh.

I can certainly see the beauty of Groovy more clearly than ever, too. Though being an old time Perl hacker makes me resent Java’s heavy duty static typing in any case.

To help me through all this I have been reading O’Reilly‘s Programming Android: Java Programming for the New Generation of Mobile Devices. Now, usually O’Reilly’s books are all but guaranteed to be the best or close to the best of any on offer, but I have my doubts that is the case with this one – it seems to be sloppily edited (eg at different times it is difficult to follow whether one is being advised to use the Android SDK or the Eclipse editor) and falls between being a comprehensive introduction to Android programming and a guide for Java hackers to get on with it. It feels less than ordered, to be honest.

Now, maybe this is a function of the language and the complexity of the environment, I don’t know. But I would welcome any alternative recommendations if anyone has some.

# Some RegEx stuff

Image by jl_2 via Flickr

So, I was trying to match results from /proc/pid/stat which gives a line that looks like this:

2321 (squid) S 1 2321 2321 0 -1 4202752 4841 0 0 0 530 577 0 0 20 0 1 0 24716 36311040 4417 18446744073709551615 1 1 0 0 0 0 0 4096 85571 18446744073709551615 0 0 17 0 0 0 1945 0 0

And where the 10th entry is the number of minor faults and the 12th entry is the number of major faults (as you can see here, Squid has had 4841 minor faults and 0 major faults since it was restarted when I changed IP address).

So a RegEx seemed to be the way to go and my first attempt looked like this:

(\\S)+\\s(\\S)+\\s(\\S)+\\s(\\S)+\\s(\\S)+\\s(\\S)+\\s(\\S)+\\s(\\S)+\\s(\\S)+\\s(\\S)+\\s(\\S)+\\s(\\S)+

But this did not work … well, it matched the line but it gave me bad results. [Note: \s matches whitespace, \S matches non-whitespace.]

I am sure you are all cleverer than me and saw the flaw straight away – but it took me some time to figure it out: (\\S)+ would treat only the first character of 4841 as a match -  what I needed to use was (\\S+) which matched the group and not just the character.

And… further to my querying of the poorly written GNU RegEx documentation, the nmatch parameter should be one bigger than the number of groups expected to be matched – in the above case that means 13.

# Confused about GNU regex documentation

The documentation for the regex (regular expression) functionality of the GNU C library says this:

When regexec matches parenthetical subexpressions of pattern, it records which parts of string they match. It returns that information by storing the offsets into an array whose elements are structures of type regmatch_t. The first element of the array (index 0) records the part of the string that matched the entire regular expression. Each other element of the array records the beginning and end of the part that matched a single parenthetical subexpression.

It then goes on to say this:

When you call regexec, you specify how long the matchptr array is, with the nmatch argument. This tells regexec how many elements to store. If the actual regular expression has more than nmatch subexpressions, then you won’t get offset information about the rest of them. But this doesn’t alter whether the pattern matches a particular string or not.

So does this mean that nmatch should contain the number of subexpressions and the array should be one bigger (as element 0 matches the whole string) or that nmatch should be equal to the size of the array?

# Dennis Ritchie

Image via Wikipedia

Dennis Ritchie, the inventor of the C programming language and the Unix operating system, has died.

If great men stand on the shoulders of giants, then Ritchie was one of the greatest giants of all.

Unix (including Mac OSX), Linux and Windows were or are written in C. (In Windows’s case it has now moved on to C++, a direct descendant of C). Your iPhone‘s operating system iOS is written in ObjectiveC which, as the name suggests, is also a derivative of the great man’s work.

# In continued praise of “Programming Pearls”

I have read two books in the last year that have fundamentally changed the way I think about computers and programming – The Annotated Turing and Programming Pearls.

Programming Pearls in particular continues to inspire me in the way it makes you think about building better algorithms and using data structures better – and here’s a short tale about what I have just done (and why I did it) with my valext program.

Valext runs a program under step tracing to allow the detailed examination of patterns of memory use – interrogating the /proc/pid/pagemap at every step.

The way I was doing this was to query the /proc/pid/maps, find the blocks that were being mapped and then seek through /proc/pid/pagemap, reading off the page status.

For every page that was mapped I would build a data structure and stick it into a linked list -

struct blocklist {
struct blocklist *nextblock;
};
...
{
block = malloc(sizeof (struct blocklist));
block->nextblock = NULL;
if (*lastblock == NULL){
*lastblock = block;
} else {
(*lastblock)->nextblock = block;
*lastblock = block;
}
}


But the problem with this code – an $\mathcal{O}(n)$ algorithm – is that performance stinks. I was running it on a very underpowered laptop and got to about four days of wallclock time, in which time about 2 million steps had been taken.

So today I rewrote it with an essentially $\mathcal{O}(1)$ algorithm – I allocate an array of arbitrary size (512 elements in current code) and fill that up – with a 0 value being the guard (ie marking the end). If more than 512 elements are needed then another 512 will be allocated and chained to the top and so on…


struct blockchain {
int size;
struct blockchain *tail;
};

uint64_t* blockalloc(int size)
{
uint64_t *buf = calloc(size, sizeof(uint64_t));
return buf;
}

struct blockchain *newchain(int size)
{
struct blockchain *chain = NULL;
chain = calloc(1, sizeof(struct blockchain));
if (!chain)
return NULL;
free(chain);
return NULL;
}
chain->size = size;
return chain;
}

/* recursively free the list */
void cleanchain(struct blockchain *chain)
{
if (!chain)
return;
cleanchain(chain->tail);
free(chain);
return;
}

/* set up a list */
struct blockchain* getnextblock(struct blockchain** chain,
{
int match, t = 0;
uint64_t i;
const char* pattern;
regex_t reg;

pattern = "^([0-9a-f]+)-([0-9a-f]+)";
if (regcomp(&reg, pattern, REG_EXTENDED) != 0)
goto ret;
match = regexec(&reg, buf, (size_t)3, addresses, 0);
if (match == REG_NOMATCH || match == REG_ESPACE)
goto cleanup;
if (*chain == NULL) {
*chain = newchain(MEMBLOCK);
if (*chain == NULL)
goto cleanup;
}
{
if (t >= MEMBLOCK) {
struct blockchain *nxtchain = newchain(MEMBLOCK);
if (!nxtchain)
goto cleanup;
(*chain)->tail = nxtchain;
*chain = nxtchain;
t = 0;
}
t++;
}
cleanup:
regfree(&reg);
ret:
return *chain;
}



As I am only testing this code now I don’t know if it will be faster – though everything tells me that it should be (though I am not entirely clear if this memory allocation issue is the real bottleneck in the program or whether it is just that a Pentium III is a Pentium III and there is nothing much I can do about that).

What I do know is that if it was not for the inspiration of Programming Pearls I probably would not even have bothered trying.

# Getting frantic

Image via Wikipedia

I have my second, self-imposed, deadline for my MSc project looming and I haven’t written a word.

I have written lots of software and might be close to actually running the experiments I need to be able to write up. Though I have had a frantic evening trying to work out why one of the programs I had written was not working properly – and have settled on a kludge in the meantime.

The code calls fork():

int main(int argc, char* argv[]) { FILE* outXML; if (argc < 2) return 0; /* must supply a file to execute */ srand(time(NULL)); pid_t forker = fork(); if (forker == 0) { //in the child process if (argc == 3) { ptrace(PTRACE_TRACEME, 0, 0, 0); execlp(argv[1], argv[2]); } else { ptrace(PTRACE_TRACEME, 0, 0, 0); execvp(argv[1], NULL); } return 0; }

char* childargs[argc - 1]; for (i = 1; i { len = strlen(argv[i]); childargs[i - 1] = malloc(len); strcpy(childargs[i - 1], argv[i]); } execvp(childargs[0], childargs);

But it failed to run properly – anyone know why?

# Writing more code to avoid writing any of the report?

Image by mrbill via Flickr

I have managed to churn out 160 lines of working C today – which I think is quite good going, though, according to this, maybe I could have churned out 400 of C++ or even 960 of Perl (I love Perl but the mind boggles).

My program will tell you how pages pages are present or how many have been swapped (it has the ability to do a bit more too, but I have not exploited that even though it is essentially there) – you can fetch it from github here: valext (the name reflects the idea that this was going to be an extension to Valgrind but then I discovered/realised that was never going to work).

Anyway, what I now have to face up to is whether I am churning out code to avoid actually writing any of my MSc project report – I have a feeling that some of that is going on and think I need to start rereading Writing for Computer Science – which helped me greatly with the proposal.