For a quant, its all about the data. Given the cost of getting new data, if there are stores of data that I can farm from, it allows me to get a jump start on refining my thesis. Tonight I am looking at Google Research to see what I find. So far I have not found references to data but in a slide presentation (http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/people/jeff/stanford-295-talk.pdf) I see a reference to the system qualities they value;
– Simplicity
– Scalability
– Performance
– Reliability
– Generality
– Features
This is as good a list as any to define the attributes people value that are divorced from the functional aspect of the product. (although I must confess I'd be guessing what they mean by Generality, and of course features is another word for functionality IMHO)
This looks like something that could be helpful...
- L1 cache reference 0.5 ns
- Branch mispredict 5 ns
- L2 cache reference 7 ns
- Mutex lock/unlock 100 ns
- Main memory reference 100 ns
- Compress 1K bytes with Zippy 10,000 ns
- Send 2K bytes over 1 Gbps network 20,000 ns
- Read 1 MB sequentially from memory 250,000 ns
- Round trip within same datacenter 500,000 ns
- Disk seek 10,000,000 ns
- Read 1 MB sequentially from network 10,000,000 ns
- Read 1 MB sequentially from disk 30,000,000 ns
- Send packet CA->Netherlands->CA 150,000,000 ns
I find these slides interesting...
Source Code Philosophy
• Google has one large shared source base
– lots of lower-level libraries used by almost everything
– higher-level app or domain-specific libraries
– application specific code
• Many benefits:
– improvements in core libraries benefit everyone
– easy to reuse code that someone else has written in another context
• Drawbacks:
– reuse sometimes leads to tangled dependencies
• Essential to be able to easily search whole source base
– gsearch: internal tool for fast searching of source code
– huge productivity boost: easy to find uses, defs, examples, etc.
– makes large-scale refactoring or renaming easier
Software Engineering Hygiene
• Code reviews
• Design reviews
• Lots of testing
– unittests for individual modules
– larger tests for whole systems
– continuous testing system
• Most development done in C++, Java, & Python
– C++: performance critical systems (e.g. everything for a web query)
– Java: lower volume apps (advertising front end, parts of gmail, etc.)
– Python: configuration tools, etc.
Multi-Site Software Engineering
• Google has moved from one to a handful to 20+ engineering sites
around the world in last few years
• Motivation:
– hire best canidates, regardless of their geographic location
• Issues:
– more coordination needed
– communication somewhat harder (no hallway conversations, time zone
issues)
– establishing trust between remote teams important
• Techniques:
– online documentation, e-mail, video conferencing, careful choice of
interfaces/project decomposition
– BigTable: split across three sites
Something else I found at Google Research was Google Correlate. Type in a term and it will find other terms whose search pattern matches. Try to correlate by time-series or geography. Kinda cool...
- thesis has a clear seasonal pattern peaking in fall and spring and matching the term factor ( United States Web Search activity for thesis and factors (r=0.9628) )
- The term "data analyst" has been trending up since 2008 after having been level in the period 2004 to 2008. It correlates with these other terms with p ranging from 0.9390 to 0.9218:
- pain management
- biotin
- ignore
- hiring manager
- coordinator salary
- spondylosis
- psychiatric nurse
- how to answer
FIRST!
ReplyDelete