Riscyforth: the work goes on

So after my last post somebody give me a “very poor” vote – I am consoling myself with the idea that was because they are really very keen for an assembly-based Forth to run on their RISC-V SBC and not because they thought it was an incoherent ramble.

In any case I have continued to work on Riscyforth – adding to its functionality and edging closer all the time to matching the Forth standard – getting close enough now to describe it as “Forth” and not just “Forth-like”.

Eight months ago, when I started, reading an online copy of “Threaded Interpretive Languages” which promised you could write your own TIL in a few weeks, I never considered using anything other than assembly to write it – just because I’m so old I thought that’s was the whole point of Forth – an interpreted language but written in machine code for speed.

I’ve learnt a lot on the way: including not to believe you can get from scratch to writing a working language in a few weeks – especially not in the world of virtual memory. Though I dare say, in the extremely unlikely event of someone commissioning me to write them a Forth from the ground up, I wouldn’t take eight months to get this far either next time.

One thing I have definitely learnt is the power and importance of unit testing – it matters quite a lot when you are putting together something on your own and can often make one assumption too many – a decent unit test soon clears you of that misapprehension.

What I haven’t got includes any users (never mind co-developers) or any useful or interesting programs.

On the former, RISC-V single board computers (SBCs) were promised to be widely available just about now, but the cancelling of the Beagle V project means that the more expensive Nezha is all that’s out there for now (I have one and it’s what is running in the image above). Compared to the ultra-cheap ARM boards (or even the high-end examples like some of the Raspberry Pi SBC kits) it’s fair to say RISC-V is not yet at the races, and the only users are likely to be people like me – hoping and waiting for the revolution to get here.

I picked Forth because the hope was it would give the first generation of RISC-V SBC hackers an easy way to get at their hardware and that is still the case, but there has to be some hardware to get at.

So I still have the sense I am working away on something that nobody else might ever want to use (the fact that it’s written in assembly also makes it fundamentally non-portable – I just cannot say ‘oh well let’s switch to the Raspberry Pi’!).

But if you are interested, current features of Riscyforth include:

  • Support for deeply nested loops and conditionals
  • Write your own words using : and ; (ie it is easy to extend the language by writing secondary words out of the existing primary words)
  • A custom memory allocator for arbitrary blocks of memory
  • Support for VARIABLE and CONSTANT
  • All (or almost all?) the stack operations you would expect
  • Hex, octal, decimal and binary support

The main thing currently missing is support for CREATE and a data space, but now I’ve grappled with the memory allocator (for general memory allocation) I will move on that. The listing of the test file might give you some ideas of what’s there, but even that is now behind where the code base is (yes, I know this isn’t how test-based development is meant to work!)

\ Unit tests

\ Stack operations tests

: testOVER2
." Testing OVER2 " 5 SPACES
10 30 50 90 70 OVER2
50 = SWAP 30 = AND IF ." OVER2 passed " else ." OVER2 FAILED " then cr

: testDrop2
." Testing DROP2 " 5 spaces
99 2 3 4
2 = if ." DROP2 passed " else ." DROP2 FAILED " then cr

: testSquare
." Testing SQUARE " 5 spaces
7 square
49 = if ." SQUARE passed " else ." SQUARE FAILED " then cr

: testCube
." Testing CUBE " 5 spaces
5 cube
125 = if ." CUBE passed " else ." CUBE FAILED " then 5 spaces
3 cube
28 = if ." CUBE FAILED " else ." CUBE passed " then cr

: testNIP2
." Testing NIP2 " 5 spaces
10 20 30 40 50 NIP2
50 = SWAP 40 = AND SWAP 10 = AND if ." NIP2 passed " else ." NIP2 FAILED " then cr 

: testDUP2
." Testing DUP2 " 5 SPACES
900 800 700 DUP2
700 = SWAP 800 = AND SWAP 700 = AND SWAP 800 = AND if ." DUP2 passed" else ." DUP2 FAILED " then cr

: testtuck2
." Testing TUCK2 " 5 spaces
1 2 3 4 5 6 TUCK2
6 = swap 5 = and swap 4 = and swap 3 = and swap 6 = and swap 5 = and if ." TUCK2 passed" else ." TUCK2 FAILED " then cr

: testswap2
." Testing SWAP2 " 5 spaces
1 2 3 4 5 6 7 8 SWAP2
6 = swap 5 = and swap 8 = and if ." SWAP2 passed " else ." SWAP2 FAILED " then cr

: testrot2
." Testing ROT2 " 5 spaces
99 2 3 4 5 6 ROT2
2 = swap 99 = and swap 6 = and if ." ROT2 passed " else ." ROT2 FAILED " then cr

." Testing DUP " 5 spaces
34 45 DUP
45 = SWAP 45 = AND swap 34 = AND IF ." DUP passed " else ." DUP FAILED " then cr ;

." Testing BL" 5 spaces BL 32 = if ." BL passed " else ." BL FAILED " then cr ;

." Testing DEPTH" 5 spaces
depth dup 0 < IF ." WARNING: Stack in negative territory " THEN 10 20 rot depth swap - 3 = IF ." DEPTH passed " ELSE ." DEPTH FAILED " then cr ;

." Testing INVERT " 5 spaces
-1 INVERT IF ." INVERT FAILED " ELSE hex 0xF0F0F0F00F0F0F35 INVERT 0xF0F0F0FF0F0F0CA = IF ." INVERT passed " ELSE ." INVERT FAILED " then then decimal cr ;

\ Basic tests

." Verifying TYPEPROMPT " cr

." Verifying GETNEXTLINE_IMM - please press RETURN only " cr

." Verifying OK " cr
OK cr

." Verifying TOKENIZE_IMM " cr

." Verifying SEARCH " cr

." Testing HEX " 5 SPACES
HEX 0x10 0xFF + DUP
0x10F = IF ." HEX passed with output 0x10F = " . ELSE ." HEX FAILED with output 0x10F =  " . then cr
\ ensure other tests keep testdup2

." Testing DECIMAL " 5 SPACES
20 = IF ." DECIMAL passed wth output 20 = " DUP . ." = " HEX . DECIMAL ELSE ." DECIMAL FAILED with output 20 = " DUP . ." = " HEX . DECIMAL THEN CR

." Testing OCTAL " 5 SPACES OCTAL 20 DUP DECIMAL 16 = IF ." OCTAL passed with output 20o = " DUP OCTAL . ." = " DECIMAL .
ELSE ." OCTAL FAILED with 20o = " DUP OCTAL . ." = " DECIMAL . THEN CR ;

." Verifying BINARY  - " 1 2 4 8 16 32 64 128 256 512
BINARY ." powers of 2 from 9 to 0 in binary... " cr
. cr . cr . cr . cr . cr . cr . cr . cr . cr . cr DECIMAL ;

." Verifying DOT " 5 spaces 0 1 2 3 4 5
." ... should see countdown from 5 to 0: " . . . . . . CR ;

." Testing ADD " 5 SPACES 900 -899 +
IF ." ADD passed " ELSE ." ADD FAILED " THEN CR ;

: testMUL
." Testing MUL " 5 spaces
5 5 5 * *
5 cube
= if ." MUL passed " else ." MUL FAILED " then cr

." Testing DIV " 5 SPACES 99 11 / 101 11 / * 81 =
IF ." DIV passed " else ." DIV FAILED " then cr ;

." Testing SUB " 5 spaces 
75 22 - 53 = IF ." SUB passed " else ." SUB FAILED " then cr ;

." Testing 1+ " 5 SPACES
10 1+ 11 = IF ." 1+ passed " ELSE ." 1+ FAILED " THEN CR ;

." Testing 2+ " 5 SPACES
10 2+ 12 = IF ." 2+ passed " ELSE ." 2+ FAILED " THEN CR ;

." Testing 1- " 5 spaces
-1 1- -2 = IF ." 1- passed " ELSE ." 1- FAILED " THEN CR ;

." Testing 2- " 5 SPACES
10 2- 8 = IF ." 2- passed " ELSE ." 2- FAILED " THEN CR ;

." Testing UNDERPLUS" 5 spaces
10 15 20 underplus 30 = if ." UNDERPLUS passed" else ." UNDERPLUS FAILED" then cr ;

." Testing MOD" 5 spaces
13 7 mod 6 = if ." MOD passed" else ." MOD FAILED" then cr ;

." Testing /MOD" 5 spaces
13 7 /mod 1 = swap 6 = and if ." /MOD passed " else ." /MOD FAILED" then cr ;

." Testing NEGATE" 5 spaces 13 negate -13 =
if ." NEGATE passed" else ." NEGATE FAILED" then cr ;

." Testing ABS" 5 spaces -13 abs 13 =
if ." ABS passed" else ." ABS FAILED" then cr ;

." Testing MAX and MIN" 5 spaces
20 10 dup2 MAX 20 = if ." MAX passed and " else ." MAX FAILED and " then min 10 = if ." MIN passed." else ." MIN FAILED." then cr ;

." Testing LSHIFT and RSHIFT" 5 spaces
10 4 lshift 160 = IF ." LSHIFT passed " ELSE ." LSHIFT FAILED " then 48 2 rshift 12 = if ." RSHIFT passed " ELSE ." RSHIFT FAILED " THEN cr ;

." Verifying WORDS .... " WORDS CR ;

." Testing LITERALNUMB .... " 213 213 = IF ." LITERALNUMB passed " ELSE ." LITERALNUMB FAILED " THEN CR ;

." Testing VARIABLE and VARIN (and @ and !)" 5 SPACES
IF ." VARIABLE, VARIN, @ and ! passed " ELSE ." VARIABLE, VARIN, @ and ! FAILED " THEN CR ;

." Testing CONSTANT " 5 SPACES

." Verifying GETLINE, TYPE and TIB " CR ." Please enter some text to be echoed back. " CR

." Testing CHAR" 5 spaces
char Z 90 = IF char z 122 = IF ." CHAR passed " else ." CHAR FAILED " then else ." CHAR FAILED " THEN cr ;

." Verifying SOURCE" 5 spaces
source type cr ;

\ Test if else then
." Testing IF ... ELSE ... THEN conditionals. " CR
1 if ." Simple IF passed " else ." Simple IF FAILED " then cr
0 1 if ." Testing nested IF... " if ." Nested IF FAILED " else ." Nested IF passed " then 5 5 * . then ." = 25 " cr
1 0 if ." Failed a final test of IF " else ." A final test of IF ... " if ." is passed " else ." is FAILED " then then cr ;

\ Stuff to test EXIT
EXIT ." If you see this EXIT FAILED " CR ;


\ Test return stack words

." Testing >R, R@ and R> along with RDROP" cr
34 35 36 >R >R >R R@ 34 = RDROP R@ 35 = AND RDROP R@ 36 = AND RDROP if ." >R, R@ and RDROP passed " else ." >R, R@ and RDROP FAILED" then cr
99 >R R> 99 = if ." R> passed " else ." R> FAILED " then cr ;

\ loop
." Testing BEGIN ... END loop " 5 SPACES
32 BEGIN DUP EMIT 1+ DUP 127 > END ."  BEGIN ... END passed " CR ;

." Testing BEGIN ... WHILE " 5 spaces
32 BEGIN DUP space hex . space decimal DUP 100 < IF DUP EMIT 1+ ELSE DUP 32 - EMIT 1+ WHILE DUP 110 = END ."  BEGIN ... WHILE passed " cr ;

." Testing DO ... LOOP " 5 spaces
1 10 1 DO DUP 1+ LOOP 10 = IF ." DO ... LOOP passed" ELSE ." DO ... LOOP FAILED" THEN CR ;

." Testing DO .... +LOOP" 5 SPACES
1 100  1 DO DUP 1+ 101 +LOOP 2 = IF ." DO ... +LOOP passed" ELSE ." DO .... +LOOP FAILED" THEN CR ;

." Verifying I and J in nested loops" CR
10 0 DO 10 0 DO ." ( "  J . ." , " I . ." ) " LOOP CR LOOP
." I and J verified" CR ; 

." Verifying LEAVE and UNLOOP " CR
10 0 DO 10 0 DO J I > J I = OR IF ." ( "  J . ." , " I . ." ) " ELSE UNLOOP LEAVE 3 0 DO LOOP 3 0 DO LOOP 3 0 DO LOOP THEN LOOP CR LOOP
." LEAVE and UNLOOP verified" CR ;

\ Testing memory functions
: ZZ ." ', EXECUTE and C! passed " ;

." Testing ', EXECUTE and C! " 5 spaces
hex 0x58 decimal ' ZZ 24 + C! ' XZ execute cr
\ Change back or else subsequent tests will break
." Testing one more time " 5 spaces
hex 0x5A decimal ' xz 24 + C! ' zZ exeCUTE  cr ;

: testcfetch 
." Testing C@" 5 spaces
' XOR 24 + c@ 88 = if ." C@ passed " else ." C@ FAILED " then cr ;

\ Dummy words to use in MOVE test
: ZM * ;
: ZD / ;
: reup decimal 68 ' ZM 25 + C! ;

." Testing MOVE " 5 spaces
10 10 ZM 100 = IF ' ZM 24 + ' ZD 24 + 24 move 100 2 ' ZM execute 50 = IF ." MOVE passed " else ." MOVE FAILED " then cr else ." Test failure " then reup ;

." Testing @ (and BASE)" 5 spaces
octal base @ 10 = hex base @ 0x10 = AND decimal base @ 10 = AND if ." @ and BASE passed" ELSE ." @ and BASE FAILED" then cr ;

." Testing +! " 5 SPACES ' ZM 24 + -1  SWAP +! 5 5 ' YM EXECUTE 25 = IF ." +! passed " ELSE ." +! FAILED " THEN 2 SPACES
' YM 24 + 1 SWAP ' +! execute 5 5 ' ZM EXECUTE 25 = INVERT IF ." +! address find FAILED " THEN  CR ;

." Testing PAD, FILL and ERASE " 5 SPACES
PAD 10 35 FILL PAD 3 + 1 ERASE PAD 2 + C@ 35 = PAD 3 + C@ 0 = AND PAD 4 + C@ 35 = AND IF ." PAD, FILL and ERASE passed" ELSE ." PAD, FILL and ERASE FAILED" THEN CR ;

\ Test groupings

\ Memory tests
." Testing memory manipulation words" cr
TESTINGTICK testcfetch testingmove testchar testfetch testplusstore TESTPADFILLERASE
." Testing of memory code complete" cr ;

\ Test loops
." Running tests of looping " cr
." Testing of loops complete" CR ;

\ Test Rstack
." Testing return stack" cr
." Testing return stack complete" cr ;

\ Test listwords
." Running 'listwords' group of tests " CR
." 'listwords' group of tests complete " CR ;

\ Test integer
." Running integer tests " cr
TESTminus2 testplus2 testunderplus testminmax testmod testslmod testabs testnegate testshifts
." Integer tests complete " CR

\ Group of stack operations tests
." Running stackop tests " cr
testDrop2 testSquare
testNIP2 testDUP2
testtuck2 testswap2 testrot2 testdup testbl testdepth
." stackop tests over " cr

\ Group of Basics tests
." Running basics tests and verifications " cr
." ***Any error message above can almost certainly be ignored*** " CR
." Verifying ENCSQ with this output " cr
." Verifying COMMENT " cr \ ." COMMENT verification FAILED " 
." Basics tests and verifications over " cr

\ Run all the tests
." Running unit tests " cr
." Press enter to continue " GETLINE CR
." Press enter to continue " GETLINE CR
." Press enter to continue " GETLINE CR
." Press enter to continue " GETLINE CR
." Press enter to continue " GETLINE CR
." Press enter to continue " GETLINE CR
." Press enter to continue " GETLINE CR
." Press enter to continue " GETLINE CR
 ABORT" Verifying ABORTCOMM and leaving tests with this message "  ." ABORTCOMM has FAILED"

A RISC-V single board computer

I finally have a RISC-V based single board computer (SBC) – the Nezha from RV Boards shipped to me directly from China.

It’s tiny (about the same form-factor as a Raspberry Pi though) and relatively expensive (it cost me just over £100 to order and get it shipped here) but whilst I was slightly concerned I was being a bit naïve in buying it (as it was either a scam or it wouldn’t work), it boots (slowly) into Linux (see image) and works (slowly).

Can RISC-V cores (which are ‘open source hardware’ and free from licensing fees) break ARM’s grip on SBCs and similar devices? A year ago the answer looked like a very clear negative as, despite years of hype, RISC-V designs just weren’t moving off the page and into silicon. Now it looks much more uncertain as the Nezha is actually the second RISC-V SBC to ship (the other, the Beagle-V, has only been distributed to a select group of developers so far – and this didn’t include me despite my application – but is expected to be available globally in the autumn).

The plans are for RISC-V SBCs retailing for less than $20 inside a year and – crucially – for the RISC-V cores to feature vector extensions which could mean some interesting use-cases being opened up.

(If you want to know more about RISC-V or if you are thinking of starting a RISC-V assembly project I cannot recommend The RISC-V Reader highly enough.)

Right now I am trying to get Riscyforth to run on my machine.

The missing link and closing schools

London, where I am writing this, is now perhaps the global centre of the covid19 pandemic, thanks to a mutation of the virus that has allowed it to spread more easily. This mutation may not have come into existence in the South East of England but it has certainly taken hold here, and about 2% of London’s population currently have symptomatic covid.

In response all primary and secondary schools, which were due to open tomorrow, will be effectively closed and teaching will go online.

Suddenly the availability of computing resources has become very important – because unlike the Spring lockdown, where online teaching was (generally) pretty limited, this time around the clear intention is to deliver a full curriculum – and means one terminal per pupil. But even now how many homes have multiple computers capable of handling this? If you have two children between the ages of 5 and 18, and two adults working from home it is going to be a struggle for many.

Thus this could have been the moment that low cost diskless client devices came into their own – but (unless we classify mobile phones as such) they essentially don’t exist. The conditions for their use have never been better – wireless connections are the default means of connecting to the internet and connections are fast (those of us who used to use X/Windows over 28kbit dial-up think so anyway).

Why did it not happen? Perhaps because of the fall in storage costs? If the screen and processor costs haven’t fallen as fast as RAM and disk then thin clients get proportionally more expensive. Or perhaps it’s that even the fat clients are thin these days? If you have a £115 Chrome book then it’s probably not able to act realistically as a server in the way a laptop costing six times as much might.

But it’s also down to software choices and legacies. We live in the Unix age now – Android mobile phones and Mac OSX machines as well as Linux devices are all running some version of an operating system that was born out of an explicit desire to create an effective means to share time and resources across multiple users. But we are also still living in the Microsoft Windows era too – and although Windows has been able to support multiple simultaneous users for many years now, few people recognise that, and even fewer know how to activate it (especially as it has been marketed as an add-on and not the build in feature we see with Unix). We (as in the public at large) just don’t think in terms of getting a single, more powerful, device and running client machines on top of it – indeed most users run away at the very idea of even invoking a command line terminal so encouraging experimentation is also difficult.

Could this ever be fixed? Well, of course, the Chrome books are sort of thin clients but they tie us to the external provider and don’t liberate us to use our own resources (well not easily – there is a Linux under the covers though). Given the low cost of the cheapest Chrome books its hard to see how a challenger could make a true thin-client model work – though maybe a few educational establishments could lead the way – given pupils/students thin clients that connect to both local and central resources from the moment they are switched on?

Getting a job

I have, essentially, two sets of skills and experience.

One is as a political campaigner and communicator. I did well out of that for a while and more than that, did some things I am proud of and feel really privileged to have had a chance to be part of.

But it’s fair to say that road seems to have hit a dead end.  If you want to run a serious, progressive, campaign then I am certainly still interested, but I am not sure there is much of that out there today.

So then there are the other skills – ones that I am told are in high demand.

Namely as a software designer/writer/developer.

I can do this and I am much better these days than I used to be: unlike, say, running I am still getting faster and sharper. C/C++/Embedded/Perl/Linux/Groovy/DSLs/R/Deep Learning – I can tick all those boxes.

But where to begin? The reputation of IT recruitment agencies is pretty grim, though I have no direct experience. I have registered with one, but I am being sent invitations to be a senior C++ engineer in Berlin on a salary of €150,000 per annum which even I think is probably a bit over-ambitious for someone with no commercial experience.

(NB: If you want to see what I have done have a look at https://github.com/mcmenaminadrian).

Proprietary software as a false economy

By Eraserhead1, Infinity0, Sav_vas - Levenez Unix History Diagram, Information on the history of IBM's AIX on ibm.com, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=1801948

I recently had to fill in a form for the Computer Science Department at the University of York.

Like, I am sure, any computer science department in any major world university, York is a “Unix shop”: research servers all run Linux and I guess the academics who aren’t using that are – as I am now – are running the modified/derived BSD that is Mac OS X.

But the form was “optimised” (i.e., only able to operate properly on) Microsoft Word – not a piece of software found on many ‘nix machines.

Because the rest of the University – like almost all of Britain’s public sector – was totally reliant on Microsoft’s proprietary offerings.

Thirty years ago I worked in a public sector organisation that used a mixture of proprietary software for “mission critical” work – Netware, Word Perfect and MS Dos. But even that mixture has gone: it’s Microsoft for everything (on the desktop) these days.

And now the price of that false economy – because so often this reliance on Microsoft has been justified because it keeps training costs low (“everybody knows how to use it”) – has been revealed by a massive global ransomware attack.

If free/open source software (FOSS) had been more-widely used then, of course, the risk would not have disappeared: not least because the crackers would have turned their attention to FOSS and left Windows behind: but there are two pretty obvious advantages to FOSS in terms of security:

  • You can see how it works – you wouldn’t walk across a bridge with no visible means of support, yet every time you use proprietary closed-source software you do just that: the fact it hasn’t fallen down yet seems like a poor justification.
  • Everybody can fix it: if Microsoft’s software breaks or is seen to have a vulnerability you are essentially reliant on them to fix it. And if you are using an unsupported piece of software you may not even have that. Again there are no guarantees of invulnerability with FOSS – software is hard – but there is a guarantee that you or anyone you ask/pay can attempt to fix your problem.

It’s time we ended this dependency on proprietary software and invested in a FOSS future.

Referenced in an academic text book

My MSc project on memory management in the Linux kernel has indeed been referenced in an academic text book – Computing Handbook, Third Edition: Computer Science and Software Engineering – you can look me up in the preview there. The article that references my work is by the great Peter J. Denning, the “father” of the working set.

The book (one volume of a set) costs £157 so I am not sure if I am going to buy it just yet.

Computer programming centre-right lovers of Swedish meatballs

English: YouGov logo
English: YouGov logo (Photo credit: Wikipedia)

According to YouGov (the UK’s largest polling company) that is what typical lovers of Linux are – though it’s based on just 272 individual profiles (out of 200,000 or so members of YouGov’s panel). Oh, and they are blokes. More at https://yougov.co.uk/profiler#/Linux/demographics

(YouGov made at least some of their profiling data available online this morning and it has kept British internet users amused all day.)

Windows lovers are, apparently, somewhat more numerous – there are 744 of them – but also typically younger, less well off and even more right wing. And are also men. Apple pie is their favourite dish and perhaps unsurprisingly they are not as keen on programming. Yes, it’s true: Windows lovers are lusers through and through. See https://yougov.co.uk/profiler#/Microsoft_Windows/demographics

Admirers of the Microsoft brand, though, tend to be older (still male) and rather more centrist – and numerous. Perhaps this is the Bill Gates effect? People admire his creation in the abstract but there is little concrete love. See https://yougov.co.uk/profiler#/Microsoft/demographics

But what of your favourite hipster computer brand – Apple? Turns out they are centrist, female and middle class and like grilled halloumi cheese. It’s harder to make a direct comparison though as (surprise, surprise) Apple users don’t seem to identify their operating system. See https://yougov.co.uk/profiler#/Apple/demographics

There is lots more to look at – for instance Android users are seemingly very left wing while computer scientists are middle aged men who eat a lot of chicken.

Scale of the task

English: Messages from the Linux kernel 3.0.0 ...
English: Messages from the Linux kernel 3.0.0 booting, from Debian sid i386. (Photo credit: Wikipedia)

I have had a frustrating few days trying to get to grips with two new pieces of the technology: the OVP simulator and the Microblaze processor.

Finally I think the fog is beginning to clear. But that also reveals just what a task I have in front of me: namely to write some kernel code that will boot the Microblaze, establish a virtual memory system and then hand over control to user code, which will have to trap memory faults and pass control back to the privileged kernel.

It is not quite writing an operating system, even a simple one, but it is actually undertaking to write what would be at the core of an OS.

Of course, there are lots of places to borrow ideas from – not least the Linux kernel – but it’s a bit daunting, if also reasonably exciting.

Preciously little books about to help – I shelled out to buy this (having borrowed it from the York Uni library and found it to be an excellent general introduction to the area) – but it’s not a guide to OVP never mind to the Microblaze. If anyone does know of a book that does either I’d be very grateful (maybe it’s my age but electronic books are very much second best to me – you just cannot flick from page to page looking for that key element you read the other day and so on.)

Give yourself a Christmas present: learn sed

English: A Shebang, also Hashbang or Sharp ban...
A Shebang, also Hashbang or Sharp bang. (Photo credit: Wikipedia)

Text is at the core of The Unix Way – and all True Unix Hackers work from the command line. This much you know.

(If you don’t get a copy of The Art of Unix Programming – there is an awful lot of rubbish in that book but it does do one thing well: explain the deep connection between text and Unix.)

In a practical sense this means to get the best from your Unix system (and this includes you if you are a Mac OSX user) you need to boost your command line skills. The first thing to do is, of course, become familiar with a text editor – either vi or emacs (I am a vi user, but refuse to engage in a religious war on this matter.)

Then, perhaps not the next thing, but one of the next things you should do is learn sed – the streaming editor – one of the many gifts to the world (including Unix, of course) from Bell Labs (I recently read The Idea Factory: Bell Labs and the Great Age of American Innovation and I suppose I really ought to get around to writing a review of that).

Sed comes from the 1970s, but as so often in computing, it feels to me that its time has come again – in the era of big data a program that allows you to edit a file one line at a time – as opposed to trying to read as much of a file as possible into your computer’s memory – has come round again.

If you are sufficiently long in the tooth to have messed about with Microsoft’s edlin or any other line editor you might be forgiven for giving a hollow laugh at this point – but sed is a tool that genuinely repays the effort you have to make to learn it.

In the last few weeks I have been messing about with 220GB XML files and even the University of York’s big iron compute server cannot handle a buffered edit of a file that size – sed is the only realistic alternative (actually I thought about using my own hex editor – hexxed – which is also essentially a line editor, but a hex editor is really for messing about with binary files and I wouldn’t recommend it.

Sed has allowed me to fix errors deep inside very large files with just a few commands – eg:

LANG=C sed ‘51815253s@^.*$@<instruction address=\’004cf024\’ size=’03’ />@’ infile.xml >outfile.xml

Fixes line 51,815,253 in my file (the line identified by an XML fatal error). Earlier I had executed another line of sed to see what was wrong with that line:

LANG=C sed -n ‘51815253p’ infile.xml

(The LANG=C prefix is because the breakage involved an alien locale seemingly being injected into my file.)

Sed allows you to do much more – for instance anything you can identify through a pattern can be altered. Let’s say you have (text) documents with your old email address – me@oldaddress.com – and you want to change that to your new address – me@newaddress.com …

sed ‘s/me@oldaddress\.com/me@newaddress\.com/g’ mytext.txt > newtext.txt

Then check newtext.txt for correctness before using mv to replace the original.

But there is much, much more you can do with it.

Plus you get real cred as a Unix hacker if you know it.

Now, too many programs these days – especially anything from Redmond – go out of their way to suppress text formats. Text, after all, is resistant to the “embrace and extend” methodology – text wants to be free. But there is plenty of it out there still.

Books that teach you about sed are not so plentiful – I have been reading an old edition of sed & awk – which seems to be out of print – though you can buy a second hand copy for less than a quid excluding postage costs. Well worth the investment, I’d say.

Third time lucky?

Last time we met, my PhD supervisor told me to expect to spend a long time making things that didn’t work: it

Crashed payphone -- Linux kernel panic
Crashed payphone — Linux kernel panic (Photo credit: sethschoen)

certainly feels like that right now.

My current task is to build a logical model of a working memory allocation scheme for a NoC.

I started with some Groovy, then realised that was going nowhere – how could I test these Groovy classes? I could write a DSL, but that felt like I’d be putting all the effort into the wrong thing.

My next thought was – write some new system calls for an experimental Linux kernel. Well, that has proved to be a pain – writing system calls is a bit of a faff (and nowhere does it seem to be fully documented – for a current kernel as opposed to a 2.2 one! – presumably because nobody should really be writing new Linux system calls anyway and so its knowledge best confined to the high priests of the cult) and testing it is proving to be even more difficult: it’s inside a VM or nothing.

Then I thought this afternoon – why bother with the kernel anyway – if I wrote a userland replacement for malloc that allocated from a fixed pool that should work just as well – so that is what I am about to try.