Monday, October 8, 2012

Liveblogging OSDI2012

I am attending OSDI 2012 here at Hollywood, CA. Lots of interesting papers here and I will try to blog about this event. In particular I am excited about attending Google’s spanner talk scheduled for tomorrow afternoon (Tuesday).


The day didn’t begin too well, because I happened to witness a roadside accident. I was on the bus going to the Loews hotel (where the conference is going on), and the bus was waiting on red. It turned green and even before the bus moved ahead, a white Toyota Prius sped to turn left. Another car came dashing on the right of the bus lane because clearly it was green for it and before anybody noticed, there was a boom and a woman shouting - the Prius was hit on its right passenger side door. From what I figured out there was no injury of anybody. Other people were busy and my bus moved on. While this was a stupid accident that could have been avoided, I wish Vehicular Networks were mainstream now. If the Prius had alerted the driver about a car coming towards it, hopefully it wouldn’t have turned left prematurely. But more than vehicular networks, I wish


Keynote


The keynote is on cancer genomics. The speaker is David Haussler from UCSC. Here is the abstract:



Cancer is a complex condition—patients present with thousands of subtypes involving different combinations of DNA mutations. Understanding cancer will require aggregating DNA data from many thousands of cancer genomes, facilitating the statistical power to distinguish patterns in the mutations. The rapidly plummeting cost of DNA sequencing will soon make cancer genome sequencing a widespread clinical practice. To anticipate this, UCSC has built a 5-petabyte database for tumor genomes that will be sequenced through National Cancer Institute projects—the Cancer Genomics Hub—and is tackling the significant computational challenges posed by storing, serving, and interpreting cancer genomics data.



Some of the questions/points raised:



  • there is an enormous opportunity to bring big data techniques to cancer genomics.

  • how do we find out mutations from these gene data.

  • how to map these mutations to the pathways that lead us to cancer, which should help us prevent these cancers.


Flat Datacenter Storage



  • FDS is simple, scalable blob storage

  • distributed metadata management

  • Built on a CLOS network with distributed scheduling.

  • High read/write performance

  • fast failure recovery

  • high application performance.


Data is organized as blobs, and each blob has multiple tracts.


Consists of: - Tractserver: sits between raw disk and network. - Metadataserver: - Client


GFS/Hadoop have the following problems: - Centralized metadata server - critical path of reads/writes - large (coarsely striped) writes


DHTs: - multiple hops to find data - slow recovery


FDS tries to position itself in between.


There is a tract location table, that maps for each locator the disks it has to read.


CLOS:


Generally we have this tree structure for the DC architecture. FDS provisions as much bandwidth as each disk requires. Full bisection bandwidth is only stochastic. Long flows are bad for load balancing. FDS generates a large number of short flows are going to diverse desitnations But TCP likes long flows. FDS creates “circuits” usign RTS/CTS.






via MIND. IS BLOWN http://mindisblown.com/blog/2012/10/08/liveblogging-osdi2012/

No comments:

Post a Comment