Linux, Learning and life ...: 2010

November 21, 2010

Simple shared memory using mmap()

I was checking out some examples in the Linux Programming Interface. The idea of using shared memory bits between the children and the parent process for synchronization looked like a simple yet powerful idea, here is an example program that does exactly that :

Note : to map allocate the memory from the virtual memory space of the proccess, you have to use MAP_ANONYMOUS.

October 29, 2010

China overtakes the US with its supercomputer

The tests performed and reported every 6 months to make it to the top 500 super computers list (http://www.top500.org/) have now confirmed that China is going to be at the top position for the list coming out in a few weeks, leaving behind the current most powerful supercomputer Jaguar by a good margin. The supercomputer named Tianhe-1A is clocking nearly 2.4 Petaflops.

It appears to be an improvement from the same machine which ranks 7 in the earlier rakings in June 2010. They seem to have developed a new interconnect which is reported to perform twice as fast as the common infinband interconnects used in most supercomputers. Also I am not sure about this, but the NYTimes report above mentions the supercomputer using NVIDIA chips, while the June 2010 mentions the use of ATI(AMD) GPUs.

August 20, 2010

The Attack - An Interesting Scenario

I have been thinking about this scenario for quite some time, assume a girl and two guys (friends) were on an adventure trip to some remote place and they were suddenly attacked by another guy. The attacker is twice as powerful as one of the guys, and four times as powerful as the other. Assuming the rate of decrease of each guy's power during a one on one fight follows the same law. Would the two guys be successful in protecting the girl ? If they would have to succeed, who should fight the attacker first ?
There essentially are a lot of small facts and observations you should consider here, the rate of decrease in power is dependent on the time and the initial power levels, and the rate is obviously not linear with time. So it is indeed intuitive that the attacker's power would decrease much faster than the other two guys' power would. Also consider the fact that I have considered the power ratios 1,2,4 in this case, what other power ratio systems in such a scenario would allow the girl to remain unharmed ?
Note : One of the first listeners to this problem asked me how the solution would change if the girl fled while the guys were fighting ;) , so for all convenience assume that the girl is shocked by the attack and does not move, and it is totally upon the two guys to protect her.

July 27, 2010

Instances of embedded devices on the cloud or an overkill ??

A short discussion with Aditya Vikram Thoomati has led me to believe that there indeed is some scope for running embedded and thin devices on the cloud. The topic might end up being controversial and boils down to a chicken-egg-chicken problem, the reason is that, the concept of cloud has come about mostly because people wanted their bulky machines to run on a platform which was available from anywhere and could be accessed on thin clients. Now as the cloud technologies advance, people might start using and monitoring their cloud instances from embedded devices such as phones, tablets etc. Now this is a great change, but when you think about running instances of the same embedded devices on the cloud, its recursion. And, also I don't see any good reason for running embedded device instances on the cloud, as I see it there are two complications :

One, that there are far too many embedded devices and platforms, so basically giving general support for them on the cloud is going to be tough .
Second, the only use of providing an instance of an embedded platform might only prove useful to the board's testers or to a few application developers.

But still, for some reason, I see a good chance that developing cloud solutions which can provided emulated instances of a few standard or all embedded platforms will be of some use.

July 19, 2010

Of Thinking beyond flat lines and surfaces !

We always think in terms of lines, circles and all kinds of euclidean geometric shapes. But the world we live in is far from those perfect rules of Euclidean Geometry. Is it always true that given a line and a point, you can only draw one line through that point parallel to the given line ?
Try this simple experiment, find a free space in your garden or somewhere, fix a point as centre, and then walk around the point in a circular path while tracing the path. Now fix five equidistant points on the circular path, join each of these points to the centre, measure the angle subtended by each of the five segments of the circular path at the centre, what will be the sum of these angles ? should it not be 360 degrees ? why am I even asking this question ??. Think Again !, if it is not the standard value you expected, can you explain why it is not ? Check out the terms Curvature , Geodesic, Riemannian Geometry, Tensors.

Your name, my name, someone's name and its Kolmogorov Complexity

While I was reading across a paper on Compression and Machine Learning, I came across the concept of Similarity/Dissimilarity measures and how they are defined empirically in terms of the Kolmogorov Complexity(KC). The term can be put in simple terms as the length of a program or algorithm required to output the particular string for which the complexity is being calculated. So in that case, assuming that we are using the natural language (english in this case) to output the required string, the KC for my name : "Satish Kumar Eerpini" is 20, since there are 20 characters. Now if I consider some other name then the KC for that name would be the number of characters in the english representation of that name, so on so forth !
Now what really struck me was, that it should be possible to write a program in some arbitrary language which can output any name, that is the KC is constant for any name. Now how can that be done ? ... keep thinking, while I try my best to come up with something !

June 28, 2010

Abstracting the thought process to a higer level : about facebook and twitter status updates

How do you generally think ? as in how do you see things when you are thinking about something. It is a known fact(you can check for yourself) that whenever someone thinks he/she uses the natural language he/she is most comfortable with, to represent the thought process implicitly. So when I look at a lady, and If she is beautiful, the first thing that flashes in my mind is "too good" or "gorgeous" or "beautiful" or just "WOW". So the thoughts are in a language. Now lets see what this greatly misleading blog post is all about.

Higher abstraction ??
Yeah, that is exactly what I am talking about. Now there are some instances when we think about things not directly at their level, but how about how they would be expressed to someone else or a similar way. For example, if you are trying to lie that you did not have your breakfast. When you plan the lie, what flashes in your brain is not the absolute representation of the thought in some language, as in you won't think about "I did not eat breakfast", rather the thoughts which appear are : "The breakfast was not good, so I did not have it", "I did not have time", "would he/she/they believe it" etc. So here you are thinking at a higher level which consists the lie, but is abstracted around other things like whom you are lying to, what are the other situations. So this is called abstracting the thought process to a level beyond the simple natural language representation of thoughts. Indeed this form of thinking is more common that the basic form.

Wasting yourself !
Now what has actually happened with the advent of SN and micro-blogging sites like facebook and twitter is that, people have started thinking about everything they do and everything they intend to do in the terms of these services. So a person addicted to these, before going for lunch, would think about how he would put this up on his facebook wall or update his status on twitter (about lunch !), he does think about every aspect of the lunch, but not really about his lunch, but how he would talk about it. So this is a even higher abstraction of the thought process. Now why is this wasting yourself (if you are doing this), it is really simple, you are wasting too much time, you are thinking about too many things and considering too many factors for say something as simple as going for lunch. Unfortunately all this happens passively and is often really tough to detect the symptoms, as the thought process is mostly just mental, not a visible process. So check if you are already doing this. Better later than never.

It does help !
You would be surprised, but this kind of thinking does really help in some corner cases. Think about a situation when someone is trying to put up a technical thought on one of these sites. He/she has convert it to really simple terms. You might have heard or seen this line somewhere or the other : "If you can explain it to your grandma, you have understood it", so it works in a similar way. For example, I bet most people ("most") on my facebook friend list are no better than my grandma at understanding a few of my thoughts, so when I try to update something out of excitement, I have to convert it into really layman terms. That in some cases (not exactly my case) helps in understanding, what they are thinking about, better.

But there are really few useful things about this habit, and the good effects can easily be substituted with other solutions, so better check if you are already addicted !

June 25, 2010

how about a game of generic chess !

Lets look at how chess works, you have 64 positions on a 8X8 square board. There are 32 pieces, 16 of one group and 16 of another. In each group half the pieces are of the same type, the other half are all distinct. Each piece has a set of rules by which it can take various positions among the remaining p positions, where p = (64,32].
Now let us come up with a class of games which can be called generic chess type games may be. All these games are two player games. The following being the governing rules :

There is a nXn square board with n*n positions
There are n/2 pieces, n/4 belong to one group and the other n/4 belong to another group
Among the n/4 pieces in each group, n/8 pieces all belong to the same group, while the remaining n/8 pieces are all different
Each type of piece has a set of rules governing which position it takes up during alternative turns with the players. There are p possible positions constrained by the rules for each piece, where p = (n, n/2]. Or accordingly if it can replace and oust the opposite player's piece from the board, in that case there are n possibilities again constrained by the ousting rules.
One piece in each group is the target, once it is ousted from the board, the game is over.

Now, come, lets play a game !

May 14, 2010

Multi-Level Trees ( Stacked Circular Linked Lists ) : A Perspective

As I had mentioned in my earlier post on fundamental work in computer science, I try to think about how simple things such as data structures can be visualized in a different or new manner which can lead to an efficient solution for certain use cases. Before reading this post, there are a few things I must warn you about. First, the structure I have described here is much further from an actual tree data structure than anything else, I call it a tree for different reasons. Second, the idea presented here might have already been thought about by someone and dumped for not being any useful, so don't condemn the post pointlessly, this is just an effort to attract some peer review on my thought process.

Structure
Now to the actual stuff. The ml-tree consists of various levels, the number of levels (k) is fixed and describes the tree. At each level there is a central node and there are n(i) nodes (where "i" is the level) which are connected in a circular doubly linked list. Each node in the circular doubly linked list is connected to the central node (though I have strong doubts about the need to connect all the nodes). The central node serves the purpose of performing "daily-wage" jobs for its level and is not supposed to hold any important data, all the data must be stored in the "nodes on the circle". More so the central node in some cases can hold a hash for the data in the "nodes on the circle". In some cases it can also act as a cache for the data on that level (I will try to discuss this usage of the data structure as a distributed cache system in a separate post).

The central nodes of all the levels are in turn connected in a linked list as represented below.

Describing Parameters
The structure is primarily described by the number of levels k. Secondly, the number of nodes at each level n(i) also describes the structure. You might be wondering why I used a function based notation for n(i) rather than a subscript based notation, there is a subtle meaning here, the number of nodes at each level can also be passed as a function, n(i) = i , or something like n(i) = i*i , and other such variants depending on the use case and the requirement.

Why is it named a ml-tree ?
Ok this might be the most idiotic part of the whole post, the first inspiration I had for this structure was from a real world tree, more so from the structural beauty of a pine tree (or something similar, a conical shape in plain terms). This ignited a further thought process (all the crap going on in my brain) on why a tree data structure is so different from a real world "3-D" tree .
This structure can also be visualized as a stack of circles (or circular doubly linked lists to be more precise) as my friend Harshit rightly pointed out, but that is only when the number of nodes at each level (n(i)) is a constant. Otherwise it leads to a stack of elements with dissimilar properties.

What next ?
I will try to write a separate post describing some possible use cases and why this structure would be an easier representation than something else. If you can think of something, do leave a comment. :)

May 13, 2010

Scope for Fundamental Work

In recent times I have come to believe that to the extent computer science and the fields which fall under its umbrella and which form its roots, have developed, it is not possible any more to make a fundamental contribution. Or to state it optimistically, it is nearly impossible to make fundamental discoveries in the field of computer science and the related tributary fields.
The first issue I think about when I have a new idea is, why wouldn't have someone already thought about this, is this really on a different line of thought, or am I just trying to re-invent the wheel. One of the most valuable suggestions I ever got during my under-graduate studies was that, "re-inventing the wheel is really bad". So the first thought is to prevent that, and in 99% of the instances I end up finding similar work already done.
By fundamental work above, I mean to point at things like coming up with a simple yet efficient data structure, some variant of a tree, or coming up with a simple sorting technique, or coming up with a new algorithm for tackling distributed program segmentation and execution (like openMP), almost everything in these aspects has already been touched. All this pseudo nonsense which I have written above originated when it struck me that a data structure which is structurally similar to a real word tree (3-D) would be viable for some operations. Coming to a conclusion about the possibilities and the use cases of such a structure is far from my sight right now, but nevertheless there is a feeling that someone would have thought about this too, may be not as a data structure, may be as some representation in graphics for generating real word structures. I will try to cover what I propose to design as a ml-tree (multi-level tree) in another post.

May 08, 2010

Checking fragmentation - part II

You should read the pre-cursor to this post here if you want to make any sense out of what I am going to write here. Anyway, I hacked the previous program further, being a person who always craves for some kind of visual representation, I decided to print the blocks of the file to the screen using a certain symbol for each block. Now this was fine for small files, but to really see some fragmentation happening you have to look at files in the order of a few hundreds of MB in size, and for such huge files, printing a symbol for each block will clutter the screen. So I decided to make the program a little more intelligent, it will instead print the blocks to the screen using the following key :
@ represents 10000 contiguous blocks on the disk
#represents 1000 contiguous blocks on the disk
$ represents 100 contiguous blocks on the disk
& represents 10 contiguous blocks on the disk
* represents 1 block on the disk
so now I print the blocks that have been used to allocate for the file and the blocks that do not belong to the file but which fall in between. The blocks which belong to the file are printed in blue while the ones which don't are printed in red.
I was searching for a highly fragmented file on my 1 TB external HD and found an iso image of about 1.2 GB which was fragmented at approx 200 places (thanks to ntfs on the external HD), so I ran the program on this file once and then copied the file to my laptop's HD(which has ext3 btw) and ran the program again, here are the results, the excessive fragmentation is visible in the first case:

tomorrow, I am going to try to create an image with the block map for a file, I will try using the gd library for this. More results tomorrow :). BTW, you can get the program from here.

May 07, 2010

Do you think it is stored together ? (checking fragmentation)

The title might be quite misleading, and hence I have put the real meaning in the brackets. The post is about using ioctl with the operation FIBMAP to check the block numbers allocated to a file in a sequential order. Whenever there is a jump in the block numbers, we know that there is some fragmentation in the way the file has been allocated on the disk. Looking at the code, it is pretty obvious that I have not written it (it is obvious from the usage of assert ;)), a little googling about FIBMAP landed me at this code. All I have done is modify the code to print only those block numbers where there appears to be fragmentation happening.The code is :

note : the FIBMAP operation is privileged, hence you will have to run the program once compiled as root. I am also adding the results I got for a test run on a file about 702 MB in size.

My next task would be to fill up most of the free space in the filesystem and then create some large files and check out the fragmentation for those files. For more information on fragmentation you could check out the useful information here.

May 04, 2010

The Motivation and the Content that will be

On the day I left college, one of my juniors asked me whether I would update my blog regularly, I was quite weary about the answer back then, but now I know the answer. Yes, I will update it frequently and here is the beginning of that describing the motivation for doing so and what primarily might be found in the updates once I start blogging actively.
The primary motivation is to encourage an open behaviour, I strongly believe that you cannot ask someone to do something you cannot do yourself. So if I want people to be open about their work, I need to do it first, and that said, I have done this mostly through direct communication in the past few years. But now since, it is not possible to directly communicate with everyone, I will want to describe my work on this blog.
Secondly, there happen to be some people who are in an early stage and need some point to start at, and for these people going through small tech write-ups and blogs is a great way to start off, indeed I was in that situation at one point of time. So that is the other reason I will write regularly.

If I start writing about everything I do from day to day, even just the technical stuff, I might very well run into hundreds of pages in less than a week, that is mainly because I don't learn by reading up books and other manuals, rather when I see something new, I try to experiment that on my machine, the next step is checking out the online man pages. Of course, there are things which have to be read from a book or some paper, that is a different case. So the point is that I will not write about everything, I will only update on things which are very exciting, things which are great for beginners to get onto, and things which can lead to good or great work at the LUG@VIT or some other place like that ;).

Though not intentional, I have discovered that most of the times what I speak and write appears greek and latin to many others. So if you feel a blog post should have been written with lesser technical complexity, then please let me know, via email or by commenting on the post. I will make an effort to make it simpler.

And a final note, this post is more to serve as reminder to me whenever I feel lazy or want to withdraw from active blogging, like I might think that I have already promised people(that might not ever happen ;) ) and might be a bad idea to fall back so if you think I am bragging too much about things, just ignore this post.

--
Satish

February 19, 2010

Why LR parsers for NLP with TAGs ? : a logical answer

It is known that context free grammars are often used to parse natural language strings. Indeed most of the contructs in natural languages can be expressed using context free grammars. And among the most efficient parsing techniques are the set of LR parsing techniques based on initial work by Don. Now the common shortcoming of LR parsers for them to be used for natural languages widely is that when conflicts occur during the parsing, the context provided by the associated context free grammar to resolve the conflict is very less. Hence conflicts occur commonly when LR parsing techniques are used for natural languages and it is not easy to remove the ambiguity involved.
Now consider the case of tree adjoining grammars, they fall into the category of mildly context sensitive formalisms and are more powerful than context free grammars and also in simple terms allow more than one level of derivation in each production. Given all these factors, applying a little logical thought would tell us that conflicts in LR parsing using a TAG for natural languages can be resolved with much more ease because of the extra context present. Given this I am presently working on writing a LR parser for TAGs, though implementations like XTAG already exist, I am doing this solely to understand the theoretical concepts better.

February 12, 2010

embedded push down automata and parsing MCS languages

I was working on coming out with a solution for parsing the following language using an embedded push down automata, because of the presence of three distinct terminals, it is pretty obvious that a simple push down automata cannot be used.

a^nb^nc^n

The main feature where an embedded PDA varies from a simple PDA is that, it has a stack where multiple stacks can be pushed. New stacks can be pushed above or below the current stack, while symbols can be pushed and popped from the current stack, symbols can only be pushed into stacks other than the current one. This effectively means that in a simple PDA when we perform a POP operation, the information is deleted from the PDA stack, where as in an embedded PDA, when we perform a POP information, the information can be either completely removed from the embedded PDA or be pushed into another stack above or below the current stack. This allows an embedded PDA to parse languages such as the above. The solution for parsing the above language is simple and can be visualized as a simple PDA for the a^nb^n part, but instead of popping out the a's when a b is encountered in the input string, we push the symbol to another stack, the symbols are finally popped out from the system when a corresponding c is encountered.

February 11, 2010

Embedded push down automaton (testing latex with blogger)

Here is the formal definition of the language which a string belongs to if it has been successfully parsed using an embedded push down automata :

C\left(M\right)=\left{q,{\gamma}_m ... {\gamma}_1,x_1,x_2\right} \epsilon Q \mathbb{X} (\Gamma^+)* \mathbb{X} \Sigma^* \mathbb{X} \Sigma^*

More equations and notes from my research on mildly context sensitive grammars and their describing automaton will make their way to the blog soon.

February 07, 2010

using getopt : is it worth the effort ?

I was recently going through the GNU getopt library for parsing command line options. On a *nix system with online man pages installed, you should be able to read the manual pages for the getopt C library functions by doing something like
man 3 getopt
Now what I am left wondering about is, whether the complexity of using the library functions when the number of arguments your program requires is one or two, worth it ? Whenever I have had to implement similar functionality into my program all I have done is use a while loop over argv until I found all the required arguments. Something like this :
.
.
.
i = 0;
num_args_found = 0;
do{
if (argv[i] == '-'){
/*check for all valid options */
/*if invalid character found then skip and continue */
}
i++;
num_args_found++;
}while (i < argc);
if (num_args_found < requried_num_of_args){
printf ("The requried arguments were not found\n");
exit (0);
}
/*rest of the program here */.
.
.
though I am sure this is the best of the approaches, and even I have sometimes used a better one or a modified version of the above code, like to show exactly which arguments were missing. But the point of discussion here is that, when the expected number of arguments is less than three or four, the complexity of using a library function is much more than that for writing some custom code as above.

January 16, 2010

"du" with single level directory traversal

I always used du to know the size used up by the files in a particular directory, mostly recursively. But there were sometimes when I wanted to know the directory and file sizes upto only one level. May be "man du" would have helped, but as most *nix users are, I was lazy to do that. While reading a forum thread about the same question here, I came across the solution :

"du --max-depth=1"

and of course adding the "-h" switch would print the sizes in human readable form.

January 12, 2010

emulating keyboard and mice input from s/w : The uinput system

The linux kernel exposes the uinput API for emulating keyboard and mice input events from any program. The default uinput API is a low-level API, requiring most operations to be done using FCNTL functions, a detailed explanation of its use can be found here.
The basic requirement for using the uinput system is that the uinput module be loaded(can be checked using lsmod). From recent discussions on the ILUGC mailing lists, I have also come across libsuinput, which is a wrapper around the low-level library and provides easier emulation.

Thread automata, dependency structures for NLP and the project

The abstract :

Several constraints have been proposed that restrict the structures for syntactic parsing such as projectivity, non-projectivity, planarity, well-nestedness, gap and edge degrees. These classes can generally be characterized by the dependency structure characteristics themselves rather by grammar formalism that generate these structures. Thread automata are a powerful but simple automata which are an extension of embedded push-down automata. This automata exhibits a wide range of parsing strategies for the so called mildly context sensitive (MCS) formalism. These formalism are considered to be a little more powerful than context free but less powerful than context sensitive in respect of parsing and semi-linear issues. In this project, we aim to connect the properties of dependency structures with the functionality of thread automata. If such a connection is established, it will increase the expressiveness and reduce the complexity of syntactic parsing of natural languages with free form word order.

Most natural languages seem to be parsed by creating dependencies in the tokens as they are parsed. Hence some constraints have been proposed for these dependency structures. The project deals with contemplating and validating the applicability of these constraints, if any, to thread automata.

January 07, 2010

Cron based script to change twitter profile image

I have been trying to write a simple python script using python-twitter which is triggered by cron and is run, say every 4 hours to change the twitter profile image. I could do something like have 6 profile images per day, each somehow representing the time of the day (or something similar).
Now I have been at the python-twitter API documentation for some time, looks like I can set the profile image url to something I want using the User class and the function SetProfileImageUrl(self,image_url), but the problem is that , the function manipulates the local values of the object (of type User) and the changes are not reflected on the twitter site. The same happens when I manipulate the status of the User object.
Is there some way to update the changes to the User object to the twitter site, or am I taking the wrong approach, or does python-twitter have support for changing the profile image at all ?