- 浏览: 11776415 次
- 性别:
- 来自: 深圳
文章分类
最新评论
-
笨蛋咯:
获取不到信息?
C#枚举硬件设备 -
guokaiwhu:
能把plan的数据结构图画出来,博主的耐心和细致令人佩服。
PostgreSQL服务过程中的那些事二:Pg服务进程处理简单查询五:规划成plantree -
gao807877817:
学习
BitmapFactory.Options详解 -
GB654:
楼主,我想问一下,如何在创建PPT时插入备注信息,虽然可以解析 ...
java转换ppt,ppt转成图片,获取备注,获取文本 -
jpsb:
多谢 ,不过我照搬你的sql查不到,去掉utl_raw.cas ...
关于oracle中clob字段查询的问题
海量数据快速搜索并及时回馈处理的“地图式分布搜索--递归型简化处理”(MapReduce)软件
MapReduce
由map和reduce两个单词组合而构成,
相当于:地图+递归,或:地图+化简,或:地图+还原,
直译为:地图--递归,或:地图--化简,或:地图--还原。
该人工合成的复合词指编程模型及软件(见:附件1-2)。
附录1
Programming model
Input & Output: each a set of key/value pairs
Programmer specifies two functions:
map (in_key, in_value) -> list(out_key, intermediate_value)
- Processes input key/value pair
- Produces set of intermediate pairs
reduce (out_key, list(intermediate_value)) ->
list(out_value)
- Combines all intermediate values for a particular key
- Produces a set of merged output values (usually just one)
Inspired by similar primitives in LISP and other languages
http://labs.google.com/papers/mapreduce-osdi04-slides/index-auto-0003.html
Example: Count word occurrences
map(String input_key, String input_value): // input_key: document name // input_value: document contents for each word w in input_value: EmitIntermediate(w, "1"); reduce(String output_key, Iterator intermediate_values): // output_key: a word // output_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result));
Pseudocode: See appendix in paper for real code
http://labs.google.com/papers/mapreduce-osdi04-slides/index-auto-0004.html
附录2:
http://code.google.com/intl/zh-CN/edu/submissions/mapreduce/listing.html
MapReduce in a Week
This page contains a comprehensive introduction to MapReduce including lectures, reading material, and programming assignments. The goal is to provide a set of lectures which can be integrated into an existing systems courses such as Operating Systems, Networking, etc, which already are taking an "under the hood" approach to computer science. Prerequisite knowledge includes Multithreading, Synchronization, locks, semaphores, barriers, etc, and sockets.
http://en.wikipedia.org/wiki/MapReduce
MapReduceis apatented[1]software frameworkintroduced byGooglein 2004 to supportdistributed computingon largedata setsonclustersof computers.[2]
The framework is inspired by themapandreducefunctions commonly used infunctional programming,[3]although their purpose in the MapReduce framework is not the same as their original forms.[4]
MapReducelibrarieshave been written inC++,C#,Erlang,Java,OCaml,Perl,Python,Ruby,F#,Rand other programming languages.
http://labs.google.com/papers/mapreduce.html
MapReduce:SimplifiedDataProcessingonLargeClusters
Jeffrey Dean and Sanjay Ghemawat
Abstract
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper.
Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.
Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.
Appeared in:
OSDI'04: Sixth Symposium on Operating System
Design and Implementation,
San Francisco, CA, December, 2004.
Download: PDF Version
Slides: HTML Slides
http://research.google.com/people/jeff/index.html
Jeffrey Dean
Google Fellow
I joined Google in mid-1999, and I'm currently a Google Fellow in the Systems Infrastructure Group. My areas of interest include large-scale distributed systems, performance monitoring, compression techniques, information retrieval, application of machine learning to search and other related problems, microprocessor architecture, compiler optimizations, and development of new products that organize existing information in new and interesting ways. While at Google, I've worked on the following projects:
- The design and implementation of the initial version of Google's advertising serving system.
- The design and implementation of five generations of our crawling, indexing, and query serving systems, covering two and three orders of magnitude growth in number of documents searched, number of queries handled per second, and frequency of updates to the system. I recently gave a talk at WSDM'09 about some of the issues involved in building large-scale retrieval systems (slides).
- The initial development of Google's AdSense for Content product (involving both the production serving system design and implementation as well as work on developing and improving the quality of ad selection based on the contents of pages).
- The development of Protocol Buffers, a way of encoding structured data in an efficient yet extensible format, and a compiler that generates convenient wrappers for manipulating the objects in a variety of languages. Protocol Buffers are used extensively at Google for almost all RPC protocols, and for storing structured information in a variety of persistent storage systems. A version of the protocol buffer implementation has been open-sourced and is available at http://code.google.com/p/protobuf/.
- Some of the initial production serving system work for the Google News product, working with Krishna Bharat to move the prototype system he put together into a deployed system.
- Some aspects of our search ranking algorithms, notably improved handling for dealing with off-page signals such as anchortext.
- The design and implementation of the first generation of our automated job scheduling system for managing a cluster of machines.
- The design and implementation of prototyping infrastructure for rapid development and experimentation with new ranking algorithms.
- The design and implementation of MapReduce, a system for simplifying the development of large-scale data processing applications. A paper about MapReduce appeared in OSDI'04.
- The design and implementation of BigTable, a large-scale semi-structured storage system used underneath a number of Google products. A paper about BigTable appeared in OSDI'06.
- Some of the production system design for Google Translate, our statistical machine translation system. In particular, I designed and implemented a system for distributed high-speed access to very large language models (too large to fit in memory on a single machine).
- Some internal tools to make it easy to rapidly search our internal source code repository. Many of the ideas from this internal tool were incorporated into our Google Code Search product, including the ability to use regular expressions for searching large corpora of source code.
I enjoy developing software with great colleagues, and I've been fortunate to have worked with many wonderful and talented people on all of my work here at Google. To help ensure that Google continues to hire people with excellent technical skills, I've also been fairly involved in our engineering hiring process.
I received a Ph.D. in Computer Science from the University of Washington, working with Craig Chambers on whole-program optimization techniques for object-oriented languages in 1996. I received a B.S., summa cum laude from the University of Minnesota in Computer Science & Economics in 1990. From 1996 to 1999, I worked for Digital Equipment Corporation's Western Research Lab in Palo Alto, where I worked on low-overhead profiling tools, design of profiling hardware for out-of-order microprocessors, and web-based information retrieval. From 1990 to 1991, I worked for the World Health Organization's Global Programme on AIDS, developing software to do statistical modelling, forecasting, and analysis of the HIV pandemic.
In 2009, I was elected to the National Academy of Engineering.
Selected Slides from Talks:
- WSDM 2009 keynote talk: Challenges in Building Large-Scale Information Retrieval Systems
- Stanford CS295 class lecture, Spring, 2007: Software Engineering Advice from Building Large-Scale Distributed Systems
Selected Publications:
-
MapReduce: Simplified Data
Processing on Large Clusters,
Communications of the ACM, vol. 51, no. 1 (2008), pp. 107-113
Jeffrey Dean and Sanjay Ghemawat. -
Large
Language Models in Machine Translation
In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 858-867.
Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, Jeffrey Dean. -
Bigtable: A
Distributed Storage System for Structured Data [PDF]
In Proceedings of OSDI 2006, Seattle, WA, 2006.
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber
Abstract -
MapReduce:
Simplified Data Processing on Large Clusters [PDF]
In Proceedings of OSDI 2004, San Francisco, CA, 2004.
Jeffrey Dean and Sanjay Ghemawat
Abstract -
Web Search for a
Planet: The Google Cluster Architecture [PDF]
In IEEE Micro, Vol. 23, No. 2, pages 22-28, March, 2003.
Luiz Barroso, Jeffrey Dean, and Urs Hölzle
Abstract -
A Comparison of
Techniques to Find Mirrored Hosts on the WWW [HTML]
JASIS (Journal of the American Society for Information Science) 51(12):1114:1122 (2000). Also presented at 1999 ACM Digital Library Workshop on Organizing Web Space (WOWS), Berkeley, CA, August 1999.
Krishna Bharat, Andrei Broder, Jeffrey Dean, and Monika R. Henzinger -
The Swift
Java Compiler: Design and Implementation
Compaq Western Research Laboratory. Research Report 2000/2, April 2000.
Daniel J. Scales, Keith H. Randall, Sanjay Ghemawat, and Jeff Dean -
Hardware
Support for Out-of-Order Instruction Profiling on Alpha 21264a
[PPT]
In Proceedings of 11th Hot Chips Symposium (1999), Palo Alto, CA, Aug., 1999.
Jennifer Anderson, Lance Berc, Jeffrey Dean, Sanjay Ghemawat, Shun-Tak Leung, Mitch Lichtenberg, George Vernes, Mark Vandevoorde, Carl A. Waldspurger, William Weihl, and Jon White
-
Finding Related Pages
in the World Wide Web [HTML]
In Proceedings of the Eighth World Wide Web Conference (WWW8), Toronto, Canada, May, 1999.
Jeffrey Dean and Monika Henzinger
-
Transparent,
Low-Overhead Profiling on Modern Processors [PostScript]
Invited paper in 1998 Workshop on Profile and Feedback-Directed Compilation, Paris, France, October, 1998. Also gave invited talk at the workshop.
Jennifer Anderson, Lance Berc, George Chrysos, Jeffrey Dean, Sanjay Ghemawat, Jamey Hicks, Shun-Tak Leung, Mitch Lichtenberg, Mark Vandevoorde, Carl A. Waldspurger, and William Weihl
-
ProfileMe: Hardware
Support for Instruction-Level Profiling on Out-of-Order Processors
[HTML]
In Proceedings of the 30th Annual Symposium on Microarchitecture, Research Triangle Park, North Carolina, December, 1997.
Jeffrey Dean, Jamey Hicks, Carl Waldspurger, William Weihl, and George Chrysos
-
Continuous
Profiling: Where Have All the Cycles Gone?
In Proceedings of 16th Symposium on Operating Systems Principles (1997), St. Malo, France, October, 1997. Selected as one of the four best papers at SOSP. An expanded version appears in a special issue of Transactions on Computer Systems, Vol. 15, Number 4, pp. 357-390 (November, 1997).
Jennifer Anderson, Lance Berc, Jeffrey Dean, Sanjay Ghemawat, Monika Henzinger, Shun-Tak Leung, Dick Sites, Mark Vandevoorde, Carl Waldspurger, and William Weihl
-
Call
Graph Construction in Object-Oriented Languages [HTML]
In Proceedings of 1997 Conference Object-Oriented Programming Languages, Systems, and Applications (OOPSLA'97), Atlanta, GA, October, 1997.
David Grove, Greg DeFouw, Jeffrey Dean, and Craig Chambers
-
Continuous
Profiling (It's 10:43; Do You Know Where Your Cycles Are?)
In Proceedings of 9th Hot Chips Symposium (1997), Palo Alto, CA, Aug., 1997.
William Weihl, Jennifer Anderson, Lance Berc, Jeffrey Dean, Sanjay Ghemawat, Monika Henzinger, Shun-Tak Leung, Dick Sites, Mark Vandevoorde, and Carl Waldspurger
-
Whole-Program
Optimization of Object-Oriented Languages [HTML]
Ph.D. Dissertation, University of Washington, Dept. of Computer Science and Engineering, November, 1996.
-
Vortex:
An Optimizing Compiler for Object-Oriented Languages [HTML]
In Proceedings of 1996 Conference Object-Oriented Programming Languages, Systems, and Applications (OOPSLA'96), San Jose, CA, October, 1996.
Jeffrey Dean, Greg DeFouw, David Grove, Vassily Litvinov, and Craig Chambers
-
Expressive,
Efficient Instance Variables [HTML]
University of Washington Technical Report, February 1996.
Jeffrey Dean, David Grove, Craig Chambers, and Vassily Litvinov
-
Optimization
of Object-Oriented Programs Using Static Class Hierarchy Analysis [HTML]
In Proceedings of 1995 European Conference on Object-Oriented Programming (ECOOP'95), Aarhus, Denmark, August, 1995.
Jeffrey Dean, David Grove, and Craig Chambers
-
A
Framework for Selective Recompilation in the Presence of Complex Intermodule
Dependencies [HTML]
In Proceedings of the Seventeenth International Conference on Software Engineering (ICSE 17), Seattle, WA, April, 1995.
Craig Chambers, Jeffrey Dean, and David Grove
-
Profile-Guided
Receiver Class Prediction [HTML]
In Proceedings of 1996 Conference of Object-Oriented Programming Languages, Systems, and Applications (OOPSLA'95), Austin, TX, October, 1995.
David Grove, Jeffrey Dean, Charlie Garrett, and Craig Chambers
-
Selective
Specialization for Object-Oriented Languages [HTML]
In Proceedings of 1995 Conference on Programming Language Design and Implementation (PLDI'95), June, 1995.
Jeffrey Dean, Craig Chambers, and David Grove
-
Identifying
Profitable Specialization in Object-Oriented Languages [HTML]
In Proceedings of the 1994 Workshop on Partial Evaluation and Semantics-based Program Manipulation (PEPM'94), Orlando, FL, June, 1994.
Jeffrey Dean, Craig Chambers, and David Grove
-
Towards
Better Inlining Decisions Using Inlining Trials [HTML]
In Proceedings of the 1994 Conference on Lisp and Functional Programming (L&FP'94), Orlando, FL, June, 1994.
Jeffrey Dean and Craig Chambers
-
Epi Info: A General-purpose Microcomputer Program for
Public Health Information Systems
In American Journal of Preventive Medicine, vol. 7, pp. 178-182, 1991.
Andrew Dean, Jeffrey Dean, Anthony Burton, and Richard Dicker
-
Software for Data Management and Analysis in
Epidemiology
In Journal of the World Health Forum, vol. 11, no. 1, 1990.
Anthony Burton, Jeffrey Dean, and Andrew Dean
Personal:
I've lived in lots of places in my life: Honolulu, HI; Manila, The Phillipines; Boston, MA; West Nile District, Uganda; Boston (again); Little Rock, AR; Hawaii (again); Minneapolis, MN; Mogadishu, Somalia; Atlanta, GA; Minneapolis (again); Geneva, Switzerland; Seattle, WA; and (currently) Palo Alto, CA. I'm hard-pressed to pick a favorite, though: each place has its plusses and minuses.
One of my life goals is to play soccer and basketball on every continent. So far, I've done so in North America, South America, Europe, Asia, and Africa. I'm worried that Antarctica might be tough, though.
相关推荐
n后问题---递归回溯法 n后问题---递归回溯法 n后问题---递归回溯法 n后问题---递归回溯法 n后问题---递归回溯法 n后问题---递归回溯法 n后问题---递归回溯法 n后问题---递归回溯法 n后问题---递归回溯法 n后问题---...
大师叫你不再害怕 ----递归 大师叫你不再害怕 ----递归 大师叫你不再害怕 ----递归
文件递归-XML递归-树图递归 面试中的常见递归算法:附带截图和详细代码
快速排序 非递归实现方式的完整源代码和测试结果。
n后问题--非递归迭代回溯.rar n后问题--非递归迭代回溯.rar n后问题--非递归迭代回溯.rar n后问题--非递归迭代回溯.rar n后问题--非递归迭代回溯.rar n后问题--非递归迭代回溯.rar
遍历递归的先中後序, 非递归的先中後序, 计算出深度 结点数 /* 运行结果: ------------------------ 请先序输入二叉树(如:ab三个空格表示a为根节点,b为左子树的二叉树) ab c 先序递归遍历二叉树: a b c 先序...
数据结构实验二叉树用递归实现先序遍历、中序遍历和后序遍历,用几种不同非递归方法实现了中序遍历,代码附有详细注释
哈夫曼编码实现_c语言 (最小堆) 求WPL -----递归求解
数据结构(c语言) 对于汉诺塔的递归实现。在对学习数据结构递归的人,帮助他们对汉诺塔和递归思想的理解
四则混合运算表达式分析程序----C#递归分析版
编译原理课程设计---递归下降分析程序的实现
8-6-3-交互式图形用户接口 8-6-4-图形库的应用方法 8-6-5-Turtle库介绍 8-6-6-图形用户接口实例 8-6-7-Turtle实例 8-6-8-Turtle Art 第6章-函数与递归-1-函数定义 第6章-函数与递归-2-函数的调用和返回值 第6章-函数...
8-6-3-交互式图形用户接口 8-6-4-图形库的应用方法 8-6-5-Turtle库介绍 8-6-6-图形用户接口实例 8-6-7-Turtle实例 8-6-8-Turtle Art 第6章-函数与递归-1-函数定义 第6章-函数与递归-2-函数的调用和返回值 第6章-函数...
8-6-3-交互式图形用户接口 8-6-4-图形库的应用方法 8-6-5-Turtle库介绍 8-6-6-图形用户接口实例 8-6-7-Turtle实例 8-6-8-Turtle Art 第6章-函数与递归-1-函数定义 第6章-函数与递归-2-函数的调用和返回值 第6章-函数...
8-6-3-交互式图形用户接口 8-6-4-图形库的应用方法 8-6-5-Turtle库介绍 8-6-6-图形用户接口实例 8-6-7-Turtle实例 8-6-8-Turtle Art 第6章-函数与递归-1-函数定义 第6章-函数与递归-2-函数的调用和返回值 第6章-函数...
[6.6.1]--413递归小结.srt
[6.6.1]--413递归小结.mp4
[5.3.1]--403递归调用的实现.srt
[5.3.1]--403递归调用的实现.mp4
后台返回菜单无限层级展示