For the past few days I have been occupied with trying to understand how NUMA works and what are related implications and challenges for system design. Linux supports more than a few memory location policies on a NUMA supported machine and a corresponding user space library to deal with these policies, migration of memory, thread location etc among other things. The library for most part seems to get information from the files in /proc/self and in /sys/devices/system/node. The latter location in the sys filesystem has a multitude of information pertaining to the memory on the system.
Here is a simple program I wrote toady which prints the memory located at each "defined" memory node on the NUMA system, this program uses the numa library libnuma. It is a rather trivial piece of code. My next task at hand (once I am done with the semester finals) would be to play around with my parallel implementation of QuickSort using pthreads and see if I can improve performance by manually playing with locality of memory and thread execution.
I did hit upon one important piece of information which I did not know before, that my laptop (DELL E6410) is NUMA compatible but has a single memory node with all the 8 Gigs of memory. This might as well be the reason why the parallel QuickSort worked faster on my laptop than on some of the department machines with distributed memory banks and faster truly multithreaded cores.
EDIT :
if you have the numactl package installed, you can also list numa related hardware information using the following command :
numactl --hardware
looking at the information you can tell that it is the same as what you would see under the various entries in the /sys/devices/system/node directory.
Here is a simple program I wrote toady which prints the memory located at each "defined" memory node on the NUMA system, this program uses the numa library libnuma. It is a rather trivial piece of code. My next task at hand (once I am done with the semester finals) would be to play around with my parallel implementation of QuickSort using pthreads and see if I can improve performance by manually playing with locality of memory and thread execution.
EDIT :
if you have the numactl package installed, you can also list numa related hardware information using the following command :
numactl --hardware
looking at the information you can tell that it is the same as what you would see under the various entries in the /sys/devices/system/node directory.