What would you like to Read today?

Minekey was a wonderous experience. It all started off about a year and a half back, when Delip came to IIT Kharagpur with a vision to solve the world’s information overload problem. The aim was simple — let content consumers get access to the information they are interested in.

If we look at the world today, most content producers and aggregators produce content for the general audience. That means that the news will be of all flavors, all topics and categories. However, it also means that if you wish to track news on a particular topic from a plethora of sources (and there is no dearth of them!), it is like finding a needle in a haystack, or a key in an information mine – which is incidentally where Minekey got its name from (if you couldn’t guess it already ;-).

As it goes, a handful of super charged students from IITKGP got together with Delip and Prof. Sudeshna Sarkar, and started working on this new project with a new model for incubation. The company was based in Santa Clara, and we were doing the R&D and the initial bootstrapping from Kharagpur.

Ideas flew — personalization, networking, recommendation, search, social connections, groups, communities, feedback, ranking, clustering, collaborative clustering, geo-personalization — you name it. Lots of ideas, debates, deliberations, re-iterations, progress spreadsheets, throwaway code, reusing old code, search, open source services, days and months later, we had a strategy in place. We created a news portal for the world at large (which was still an Alpha), we had strong customer leads, a model seemed to be emerging.

A lot of water has since flowed in the Ganges. There is a definite strategy, the company is well funded, there are people who have left plum jobs to work at Minekey, the business model has been refined and Minekey is staring at an immense opportunity. Kudos to the current team to take a prototype and build a real product.

Minekey has now launched recommendations for blogs! You get a sweet looking widget on your sidebar in a matter of minutes, and your friends would be able to get recommendations. Minekey monitors their clicks and as the users click from the widget, the personalization kicks in, to recommend more and more stories according to the users taste. (Sadly, WordPress doesn’t support JavaScript, otherwise you would have seen one right here. I need to work on a workaround).

Go get it now!

Advertisements

Serial Writers on the Loose

Serialization is the process of writing data out in a storage medium in a pre-defined format so that it can be read back programmatically. Even though there can be many forms of Serialization (even usage of databases is Serialization in some sense), in most cases, serialization is used to persist the program state onto a file and read it back so that computation can proceed from an intermediate state.

I remember at one point of time, I used to fret and worry about having to decide a format to write my data in, and then write procedures for writing them out into a file and then reading them back. And then look for minute bugs which crop up in this process. Developer productivity goes for a six while the manager manages a sinister laugh in the sidelines as the employee sweats it out with printf and scanf.

No more! Thankfully there is something called object serialization that comes to our rescue. With the dawn of languages such as Java and the .NET platform, serialization was made very simple, in fact so simple that little cherubs (in their garden playing harps or rather System.IO.Harp) could just add a single attribute to the class name and all the perspiration is taken care of by the framework.

In the .NET framework, serialization can be very simply implemented by adding the [Serializable] attribute to the class. Any fields that one doesn’t want to be serialized can be marked with a [NonSerialized] attribute. Thereafter, you just use a BinaryFormatter to format data into a suitable binary format, open a FileStream to write it out to a file, and bingo, you’re done!

However, I was not convinced it could be so easy. The objects typically form a graph structure and might be cyclical. Would the automatic implementation of .NET Serialization be able to handle this cyclical structure, or would I need to fish DFS and BFS from my old dusty textbooks, parse them in an acyclic manner and write them in a suitable order. All my worries were misplaced! It just works… it’s magical. You can be a dork and use the object serialization in .NET!

There is a small catch though. Static fields of a class are not serialized automatically (among some other gotchas). And you need to implement the ISerializable interface and provide custom implementations of the GetObjectData method and a constructor of the class which takes SerializationInfo as a parameter (See some examples here) in order to deal with these special cases. In my simple case, I had an identifier for every object which I allocated based on a idCount variable which was a static integer. Hence, in the custom constructor, I set the value of the idCount to the highest id value I had seen so far during deserialization so that I don’t generate old id values gain.

id = info.GetInt32(“id”);
if ((this.id + 1) > idCount)
idCount = (this.id + 1);

But still, object serialization is a great boon for most of us developers who don’t wish to tear their hair out trying to write code that is not even important for their work!

Ok. So, I discovered that in case you override the Serialization using the ISerializable interface, you need to override it for all the derived classes. A lot of hard work I am not willing to do. Another trick could be that in case of the base class, instead of implementing the ISerializable interface, implement the IDeserializationCallback, which essentially lets you override the OnDeserialization method which is called after the deserialization is done, and you can write custom logic there to populate the static or the Non-Serialized fields.

Vint Cerf

Vinton Cerf, best known for having written the Internet Protocol, which is the substrate of the internet, gave a talk today in Bangalore at Ambedkar Bhavan. He is an ACM Turing award winner, the highest civilian honour in Computer Science and currently serves as the VP and Chief Internet Evangelist at Google. One thing I found surprising was that even though Google organized the talk, I could not find any references to the talk anywhere on the Internet (searched on Google of course), except for this blog. I had expected Google to put up a searchable page online at least regarding its much advertised “Google Speaker Series”. I saw the advert in the papers and later wanted to find more details, but alas! it seems they dont have a good PR person.

Anyway, coming back to the talk. I felt that the talk was a little too high level. That is mostly what happens in such talks where the speaker has to cover a lot of ground in a small space of time (Spacetime?). I felt Raghu Ramakrishnan handled it better — he elaborated on a specific application (DB Life). A few takeaways from the talk were:

  1. During the design of the IP Protocol he decided to keep the design independant of both the applicaiton (it just works on bytes) as well as the infrastructure (IP works on all concievable connection system from telephone lines, ethernet, ATM, VPN) and it has stood IP in good stead since. This independence ensured that lots of new applications (some of which even surprised Vint, such as a internet-enabled surf board) as well as new tele-infrastructure.
  2. Mobile users in India are about 200m as compared to 40m or so internet users. What is the medium of choice for connectivity in the future?
  3. Of about 1b internet users, around 400m are in Asia with China and Japan accounting for around 150m each. North America is only third after Europe. And we thought internet is in English.
  4. As more and more devices get online, there will be many more applications.
  5. Hardware has improved substantially thanks to Moore’s law, but software has not kept pace. Higher level languages such as Python have not helped much. Perhaps they should all shift to Lisp :). He also seemed to believe Ajax is a high level language. I am not sure how — perhaps my understanding is flawed, but I used to think it is only a mechanism for ensuring asynchronous call backs on web pages with very few language level features (which would definitely be desirable).
  6. He seemed to make a strong point for Formal Methods in Software Analysis to find out bugs in Software and be able to give guarantees.
  7. He seemed to believe that our current infrastructure needs to incorporate security as a first class design parameter. Concepts such as VPN sit atop traditional networks like a veil, but with increasing mobility, the need for security being fundamentally encoded in our design is a requirement.
  8. Similarly, prediction of internet usage (similar to Erlang in Telecom) and QoS are perhaps desirable but have technical difficulties and these make them important problems in Computer Science.
  9. The QnA was mostly irrelevant — What will Google do in future? What does Google use apart from PageRank? How can internet be made more accessible in India? With (mostly) even more irrelevant answers.

It would have been nice to hear more about his experiences in the actual design of IP.

%d bloggers like this: