As I had mentioned in my earlier post on fundamental work in computer science, I try to think about how simple things such as data structures can be visualized in a different or new manner which can lead to an efficient solution for certain use cases. Before reading this post, there are a few things I must warn you about. First, the structure I have described here is much further from an actual tree data structure than anything else, I call it a tree for different reasons. Second, the idea presented here might have already been thought about by someone and dumped for not being any useful, so don't condemn the post pointlessly, this is just an effort to attract some peer review on my thought process.
Structure
Now to the actual stuff. The ml-tree consists of various levels, the number of levels (k) is fixed and describes the tree. At each level there is a central node and there are n(i) nodes (where "i" is the level) which are connected in a circular doubly linked list. Each node in the circular doubly linked list is connected to the central node (though I have strong doubts about the need to connect all the nodes). The central node serves the purpose of performing "daily-wage" jobs for its level and is not supposed to hold any important data, all the data must be stored in the "nodes on the circle". More so the central node in some cases can hold a hash for the data in the "nodes on the circle". In some cases it can also act as a cache for the data on that level (I will try to discuss this usage of the data structure as a distributed cache system in a separate post).
The central nodes of all the levels are in turn connected in a linked list as represented below.
Describing Parameters
The structure is primarily described by the number of levels k. Secondly, the number of nodes at each level n(i) also describes the structure. You might be wondering why I used a function based notation for n(i) rather than a subscript based notation, there is a subtle meaning here, the number of nodes at each level can also be passed as a function, n(i) = i , or something like n(i) = i*i , and other such variants depending on the use case and the requirement.
Why is it named a ml-tree ?
Ok this might be the most idiotic part of the whole post, the first inspiration I had for this structure was from a real world tree, more so from the structural beauty of a pine tree (or something similar, a conical shape in plain terms). This ignited a further thought process (all the crap going on in my brain) on why a tree data structure is so different from a real world "3-D" tree .
This structure can also be visualized as a stack of circles (or circular doubly linked lists to be more precise) as my friend Harshit rightly pointed out, but that is only when the number of nodes at each level (n(i)) is a constant. Otherwise it leads to a stack of elements with dissimilar properties.
What next ?
I will try to write a separate post describing some possible use cases and why this structure would be an easier representation than something else. If you can think of something, do leave a comment. :)
May 14, 2010
May 13, 2010
Scope for Fundamental Work
In recent times I have come to believe that to the extent computer science and the fields which fall under its umbrella and which form its roots, have developed, it is not possible any more to make a fundamental contribution. Or to state it optimistically, it is nearly impossible to make fundamental discoveries in the field of computer science and the related tributary fields.
The first issue I think about when I have a new idea is, why wouldn't have someone already thought about this, is this really on a different line of thought, or am I just trying to re-invent the wheel. One of the most valuable suggestions I ever got during my under-graduate studies was that, "re-inventing the wheel is really bad". So the first thought is to prevent that, and in 99% of the instances I end up finding similar work already done.
By fundamental work above, I mean to point at things like coming up with a simple yet efficient data structure, some variant of a tree, or coming up with a simple sorting technique, or coming up with a new algorithm for tackling distributed program segmentation and execution (like openMP), almost everything in these aspects has already been touched. All this pseudo nonsense which I have written above originated when it struck me that a data structure which is structurally similar to a real word tree (3-D) would be viable for some operations. Coming to a conclusion about the possibilities and the use cases of such a structure is far from my sight right now, but nevertheless there is a feeling that someone would have thought about this too, may be not as a data structure, may be as some representation in graphics for generating real word structures. I will try to cover what I propose to design as a ml-tree (multi-level tree) in another post.
The first issue I think about when I have a new idea is, why wouldn't have someone already thought about this, is this really on a different line of thought, or am I just trying to re-invent the wheel. One of the most valuable suggestions I ever got during my under-graduate studies was that, "re-inventing the wheel is really bad". So the first thought is to prevent that, and in 99% of the instances I end up finding similar work already done.
By fundamental work above, I mean to point at things like coming up with a simple yet efficient data structure, some variant of a tree, or coming up with a simple sorting technique, or coming up with a new algorithm for tackling distributed program segmentation and execution (like openMP), almost everything in these aspects has already been touched. All this pseudo nonsense which I have written above originated when it struck me that a data structure which is structurally similar to a real word tree (3-D) would be viable for some operations. Coming to a conclusion about the possibilities and the use cases of such a structure is far from my sight right now, but nevertheless there is a feeling that someone would have thought about this too, may be not as a data structure, may be as some representation in graphics for generating real word structures. I will try to cover what I propose to design as a ml-tree (multi-level tree) in another post.
Labels:
computer science,
content,
data structures,
ml-tree,
motivation,
philosophy
May 08, 2010
Checking fragmentation - part II
You should read the pre-cursor to this post here if you want to make any sense out of what I am going to write here. Anyway, I hacked the previous program further, being a person who always craves for some kind of visual representation, I decided to print the blocks of the file to the screen using a certain symbol for each block. Now this was fine for small files, but to really see some fragmentation happening you have to look at files in the order of a few hundreds of MB in size, and for such huge files, printing a symbol for each block will clutter the screen. So I decided to make the program a little more intelligent, it will instead print the blocks to the screen using the following key :
@ represents 10000 contiguous blocks on the disk
#represents 1000 contiguous blocks on the disk
$ represents 100 contiguous blocks on the disk
& represents 10 contiguous blocks on the disk
* represents 1 block on the disk
so now I print the blocks that have been used to allocate for the file and the blocks that do not belong to the file but which fall in between. The blocks which belong to the file are printed in blue while the ones which don't are printed in red.
I was searching for a highly fragmented file on my 1 TB external HD and found an iso image of about 1.2 GB which was fragmented at approx 200 places (thanks to ntfs on the external HD), so I ran the program on this file once and then copied the file to my laptop's HD(which has ext3 btw) and ran the program again, here are the results, the excessive fragmentation is visible in the first case:
tomorrow, I am going to try to create an image with the block map for a file, I will try using the gd library for this. More results tomorrow :). BTW, you can get the program from here.
@ represents 10000 contiguous blocks on the disk
#represents 1000 contiguous blocks on the disk
$ represents 100 contiguous blocks on the disk
& represents 10 contiguous blocks on the disk
* represents 1 block on the disk
so now I print the blocks that have been used to allocate for the file and the blocks that do not belong to the file but which fall in between. The blocks which belong to the file are printed in blue while the ones which don't are printed in red.
I was searching for a highly fragmented file on my 1 TB external HD and found an iso image of about 1.2 GB which was fragmented at approx 200 places (thanks to ntfs on the external HD), so I ran the program on this file once and then copied the file to my laptop's HD(which has ext3 btw) and ran the program again, here are the results, the excessive fragmentation is visible in the first case:
tomorrow, I am going to try to create an image with the block map for a file, I will try using the gd library for this. More results tomorrow :). BTW, you can get the program from here.
Labels:
C,
ext3,
filesystems,
fragmentation,
ioctl,
linux,
ntfs
May 07, 2010
Do you think it is stored together ? (checking fragmentation)
The title might be quite misleading, and hence I have put the real meaning in the brackets. The post is about using ioctl with the operation FIBMAP to check the block numbers allocated to a file in a sequential order. Whenever there is a jump in the block numbers, we know that there is some fragmentation in the way the file has been allocated on the disk. Looking at the code, it is pretty obvious that I have not written it (it is obvious from the usage of assert ;)), a little googling about FIBMAP landed me at this code. All I have done is modify the code to print only those block numbers where there appears to be fragmentation happening.The code is :
note : the FIBMAP operation is privileged, hence you will have to run the program once compiled as root. I am also adding the results I got for a test run on a file about 702 MB in size.
My next task would be to fill up most of the free space in the filesystem and then create some large files and check out the fragmentation for those files. For more information on fragmentation you could check out the useful information here.
note : the FIBMAP operation is privileged, hence you will have to run the program once compiled as root. I am also adding the results I got for a test run on a file about 702 MB in size.
My next task would be to fill up most of the free space in the filesystem and then create some large files and check out the fragmentation for those files. For more information on fragmentation you could check out the useful information here.
Labels:
C,
fibmap,
filesystems,
fragmentation,
ioctl,
linux
May 04, 2010
The Motivation and the Content that will be
On the day I left college, one of my juniors asked me whether I would update my blog regularly, I was quite weary about the answer back then, but now I know the answer. Yes, I will update it frequently and here is the beginning of that describing the motivation for doing so and what primarily might be found in the updates once I start blogging actively.
The primary motivation is to encourage an open behaviour, I strongly believe that you cannot ask someone to do something you cannot do yourself. So if I want people to be open about their work, I need to do it first, and that said, I have done this mostly through direct communication in the past few years. But now since, it is not possible to directly communicate with everyone, I will want to describe my work on this blog.
Secondly, there happen to be some people who are in an early stage and need some point to start at, and for these people going through small tech write-ups and blogs is a great way to start off, indeed I was in that situation at one point of time. So that is the other reason I will write regularly.
If I start writing about everything I do from day to day, even just the technical stuff, I might very well run into hundreds of pages in less than a week, that is mainly because I don't learn by reading up books and other manuals, rather when I see something new, I try to experiment that on my machine, the next step is checking out the online man pages. Of course, there are things which have to be read from a book or some paper, that is a different case. So the point is that I will not write about everything, I will only update on things which are very exciting, things which are great for beginners to get onto, and things which can lead to good or great work at the LUG@VIT or some other place like that ;).
Though not intentional, I have discovered that most of the times what I speak and write appears greek and latin to many others. So if you feel a blog post should have been written with lesser technical complexity, then please let me know, via email or by commenting on the post. I will make an effort to make it simpler.
And a final note, this post is more to serve as reminder to me whenever I feel lazy or want to withdraw from active blogging, like I might think that I have already promised people(that might not ever happen ;) ) and might be a bad idea to fall back so if you think I am bragging too much about things, just ignore this post.
--
Satish
The primary motivation is to encourage an open behaviour, I strongly believe that you cannot ask someone to do something you cannot do yourself. So if I want people to be open about their work, I need to do it first, and that said, I have done this mostly through direct communication in the past few years. But now since, it is not possible to directly communicate with everyone, I will want to describe my work on this blog.
Secondly, there happen to be some people who are in an early stage and need some point to start at, and for these people going through small tech write-ups and blogs is a great way to start off, indeed I was in that situation at one point of time. So that is the other reason I will write regularly.
If I start writing about everything I do from day to day, even just the technical stuff, I might very well run into hundreds of pages in less than a week, that is mainly because I don't learn by reading up books and other manuals, rather when I see something new, I try to experiment that on my machine, the next step is checking out the online man pages. Of course, there are things which have to be read from a book or some paper, that is a different case. So the point is that I will not write about everything, I will only update on things which are very exciting, things which are great for beginners to get onto, and things which can lead to good or great work at the LUG@VIT or some other place like that ;).
Though not intentional, I have discovered that most of the times what I speak and write appears greek and latin to many others. So if you feel a blog post should have been written with lesser technical complexity, then please let me know, via email or by commenting on the post. I will make an effort to make it simpler.
And a final note, this post is more to serve as reminder to me whenever I feel lazy or want to withdraw from active blogging, like I might think that I have already promised people(that might not ever happen ;) ) and might be a bad idea to fall back so if you think I am bragging too much about things, just ignore this post.
--
Satish
Subscribe to:
Posts (Atom)