Reaching a decision


English: Distributed Memory
English: Distributed Memory (Photo credit: Wikipedia)

A week further on and not much C++ has been written – and now I think I need to make a new start.

Up to this point I have been trying to write a software model of the hardware and my thought was I could think put a software-modelling layer on top of that. But that simply is not going to work – it is just too complex.

Instead I am going to have to make some policy decisions in the software – essentially over how I model the local memory on the chip: each tile will process memory reads and writes and needs to know where that memory is – it could be in the global off-chip memory store or it could be on-chip.

The difference matters because, at least in theory, the on-chip memory is speedily accessible, while the off-chip memory is 50 to 100 to 500 times “further away”.  Because memory accesses exhibit locality it makes sense to ship blocks of addressed memory from the global to the local store – but doing so takes time and if there are a lot of memory movements then we get thrashing.

What I now have to do is think of what policy I will use to decide what memory gets stored locally (or, more likely, what policy I use to map the addresses). I’ll start by once again reviewing papers that propose some schemes for existing Networks-on-Chip.

In other news: I have had a paper (of which I am co-author and first named author) accepted by OSPERTS 15 – so I will be off to Sweden to get mauled by the audience there in early July. It will be an experience, and I am looking forward to it, but I also think it might be not so much a baptism, but a destruction by fire.

Advertisements

Further thoughts on the simulation task


A Motorola 68451 MMU (for the 68010)
A Motorola 68451 MMU (for the 68010) (Photo credit: Wikipedia)

Lying in bed this morning and puzzling over what to do …

At first I thought what I should do is copy one of the existing operating system models for NoCs, but that simply would not be flexible enough.

What I have to do is model the hardware (including the modifications to the MMU I want to see) as, essentially, some form of black box, and build other layers – including the memory tree – above that. That means I need to separate the CPU/tiles from global memory: sounds simple in theory but implementing this is going to be very far from easy.

Struggling


Die of an Intel 80486DX2 microprocessor (actua...
Die of an Intel 80486DX2 microprocessor (actual size: 12×6.75 mm) in its packaging. (Photo credit: Wikipedia)

Been a while since I’ve written here – been avoiding writing about politics, which has obviously not been so great for me in the last couple of weeks… but now I have something else to ruminate on.

I have reached a milestone, or perhaps basecamp, in my PhD research: having a model for memory management that needs further exploration. (Hopefully there may even be a paper on this soon.)

Some of that exploration will have to be in hardware, and that’s really beyond me but I can and should build a software model to test how a many core system built using this new model might operate.

So far I have been testing or proving concepts with OVPSim but it does not allow me to build a true asynchronous multi-core model, so I need to do that in software myself.

But where to begin – I have a list of classes that you might want to have in C++:

  • Computer – which would aggregate…
    • DRAM
    • Storage
    • NoC – which would aggregate…
      • Mesh
      • Tiles – which would aggregate…
        • CPU
        • Cache
        • Ports (to the Mesh)

I hope you can see how quickly this becomes complex – and all we are talking about here is a simple software framework to allow me to do the research (ie., delivering the software, complex as it is, is only the very start.)

I am struggling to know where to begin – C++ seems like the logical choice for this, but it’s not proving to be much fun. Particularly because my CPU class has to be able to “execute” some code – I thought about using a DSL but may stick to the XML output I got from my hacked Valgrind Lackey – as at least I can then use existing XML engines.

Should I build from the XML up – eg., get a CPU class that can hack the XML and pass the requests up the chain (eg via the cache up to the Mesh and up to the DRAM etc), or what?