The Web in 1994

Found this video by Digital Equipment Corporation (Digital, later Altavista … and now I don’t know where) about how this new phenomenon called the Internet is taking over the world.

Ah, Nostalgia!

[From John Battelle]

Advertisements

¿ʞunɹp noʎ ǝɹɐ

Cool site. Helps you invert text – http://www.revfad.com/flip.html

[Thanks to Aashu for the site as well as the title :)]

Wiki Groaning

This is a hilarious article on the nerd bias in Wikipedia. While wikipedia is a great resource, there is a lot of peanut butter which nobody wants to eat, just because some people love peanut butter, and most wikipedia articles are written by people who like peanut butter.

Just go through the entries, and you’ll know why! Sample these:

  • Modern Warfare v. Lightsaber Combat
  • Prime Number v. Optimus Prime (the Transformers character)
  • Girlfriend v. Video Games
  • Half Life v. Half Life 2
  • Love v. Masturbation
  • Bathing v. Acne
  • [Thanks to TechCrunch for the lead]

    In the company of friends – Gates and Jobs

    Another article at Desicritics. 

    Would not rate my writing very high since I wrote it when I was half asleep, but I wanted to write about this monumental meeting between the two titans and so there it is.

    Digg on the LAMP Stack

    This is an interesting article on how Digg leverages the LAMP stack (Linux, Apache, MySQL and PHP). The article describes how Digg started with one server, added a load balancer and other tricks to scale up. Right now, they work with well over 100 servers and scalability always remains a challenge because they are growing so fast.

    I would have liked a longer article with more technical details, but something is better than nothing I guess.

    What would you like to Read today?

    Minekey was a wonderous experience. It all started off about a year and a half back, when Delip came to IIT Kharagpur with a vision to solve the world’s information overload problem. The aim was simple — let content consumers get access to the information they are interested in.

    If we look at the world today, most content producers and aggregators produce content for the general audience. That means that the news will be of all flavors, all topics and categories. However, it also means that if you wish to track news on a particular topic from a plethora of sources (and there is no dearth of them!), it is like finding a needle in a haystack, or a key in an information mine – which is incidentally where Minekey got its name from (if you couldn’t guess it already ;-).

    As it goes, a handful of super charged students from IITKGP got together with Delip and Prof. Sudeshna Sarkar, and started working on this new project with a new model for incubation. The company was based in Santa Clara, and we were doing the R&D and the initial bootstrapping from Kharagpur.

    Ideas flew — personalization, networking, recommendation, search, social connections, groups, communities, feedback, ranking, clustering, collaborative clustering, geo-personalization — you name it. Lots of ideas, debates, deliberations, re-iterations, progress spreadsheets, throwaway code, reusing old code, search, open source services, days and months later, we had a strategy in place. We created a news portal for the world at large (which was still an Alpha), we had strong customer leads, a model seemed to be emerging.

    A lot of water has since flowed in the Ganges. There is a definite strategy, the company is well funded, there are people who have left plum jobs to work at Minekey, the business model has been refined and Minekey is staring at an immense opportunity. Kudos to the current team to take a prototype and build a real product.

    Minekey has now launched recommendations for blogs! You get a sweet looking widget on your sidebar in a matter of minutes, and your friends would be able to get recommendations. Minekey monitors their clicks and as the users click from the widget, the personalization kicks in, to recommend more and more stories according to the users taste. (Sadly, WordPress doesn’t support JavaScript, otherwise you would have seen one right here. I need to work on a workaround).

    Go get it now!

    Andrew Tomkins on Web Search and Online Communities

    Went to another of the talks in the Big Thinkers Series by Yahoo! Bangalore. Andrew Tomkins talked about Web Search and Online communities. Andy is the Director of Search Research at Yahoo! Research.

    I was very disappointed by the talk. I had expected a lot more from the person who possibly determines the future of Web Search at one of the leading Search Engines in the world. The talk started off with Andy giving a slide show of images from Flickr which rate high on interestingness. That was the good part – the pictures were cool. But thereafter he went into why Flickr was a better social network than other networks — he gave some quantifiable metrics such as the size maximal strongly connected components in the relationship graph and the number of nodes in the graph with a degree of more than a number k, which was parameterized on the X-axis. For Flickr, it seems there are a number of people with 450 friends or more while for another social networking site (a la LinkedIn/Orkut) the number is an order of magnitude smaller. I did not buy his argument that this indicates that Flickr was a more successful social network. Being able to maintain 450 friends is very difficult (ask me! I don’t even interact with many of my 800 odd friends on Orkut). Besides the nature of the two social networks was very different. He also touched upon how social networks are interacting (Upcoming and Flickr).

    He then went into how the Internet is growing and the amount of data being generated. Some back of the envelopes (6B people typing away on computers for 4 hours everyday) would generate data with an upper bound of about 150 PetaBytes. However, that data is more and more decentralized. The amount of data which passes through Yahoo! network is only about 11% of the web’s data right now and that is falling fast. Nobody else even comes close (according to Andy). At the same time, one can consume only the data one wants — thanks of course to del.icio.us and RSS feeds and better personalization algorithms. That indicates that both content sources and content consumption is becoming more and more decentralized and democratized. At current rates, the storage of the amount of data being created will cost about $ 25M which should fall. Smaller players can crawl and store the content present on the web. This is great for entrepreneurers because this means that they can match the Big 3 (GOOG, MSFT, YHOO) at least in terms of storage!

    Some of the latest developments in search have been special treatment of specialized domains. For instance, if you search for weather, or movies, or flight information, current generation search engines are able to figure out the domain of the search query and provide custom UI for the results based on the domain. For instance, they might give movie timings at theaters near your ZIP code. This is going to become more ubiquitous with special treatment for a lot more domains being added. However, I am not sure how many domains can be supported by a rule-based treatment for each of them. Integration of search results of different types and genres of media (images, video, text) and ranking algorithms for the combined result set remain a challenge. We are going to see the addition of more and more social features as days go by. Crawling and collecting data in the light of new programming models (like Ajax) are going to be a challenge. Andy was not aware of any good solutions to this problem.

    The last part of the talk was a real disappointment. He started talking about some of his recent research on estimating properties about a hidden corpus on the basis to the answers to a number of queries. While there is no doubting that his research is worthwhile, it was perhaps a wrong forum to get into mathematical equations and that too suddenly after having talked about general technology. I got the feeling that he wanted to talk about it just to show that he still does some technical work :-).

    Overall, the talk didn’t meet my expectations. The last one by Raghu Ramakrishnan was by far better.

    [If I have missed out something, please point out in the comments. Thanks!]

    %d bloggers like this: