-
Notifications
You must be signed in to change notification settings - Fork 3
A project to develop a fully distributed MapReduce library for Haskell which makes using the MapReduce framework totally transparent for application programmers.
License
jdstmporter/Haskell-MapReduce
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
HASKELL MAPREDUCE LIBRARY This is an implementation of MapReduce for use on a single machine. It will automatically parallelise into multiple threads running on all available processors. The key idea is that map and reduce stages can just be treated as functions of type [s] -> [(s',a)] where s is the input data type, s' is the output data type and a is the output key type. The package providers a wrapper function liftMR that converts the map / reduce function into a monadic function, so building a processing chain is simply a matter of setting out liftMR map1 >>= liftMR reduce1 >>= liftMR map2 >>= liftMR reduce2 >>= . . . Then existing utility functions allow one to evaluate the processing chain on specified initial data. Key points are: (1) From the point of view of the applications programmer MapReduce is totally transparent; mappers and reducers can be written as simple functions; the wrapper function and the monad do all the work of distributing work across mappers / reducers, sorting the output based on key value, etc. (2) Writing a MapReduce function reduces to writing the mappers / reducers and the providing fewer than ten lines of boilerplate code. (3) The fact that the distinction between mappers and reducers is eliminate makes it possible to produce processing chains more general than those permitted by more traditional MapReduce frameworks, such as HADOOP. THE PACKAGE The package contains: (1) Full source code for the library, with Haddock documentation. (2) A Cabal file to build the library and generate documentation. To build the library and applications: mkdir MR cd MR git clone git://github.com/Julianporter/Haskell-MapReduce.git cd Haskell-MapReduce cabal install Cabal will the proceed to build the library, add it to your installation and create Haddock documentation. As well as the library, it will install two applications: (1) 'wordcount' : this does a simple word count on a supplied file of space- separated words and prints the result. (2) 'testwordcount' : this applies automated tests using the word count algorithm, using the QuickCheck testing framework. The maximum number of words is kept to less than 10,000 because for larger values the tests can become extremely slow. FOR MORE INFORMATION Project website: http://blog.jpembeddedsolutions.com/MapReduce
About
A project to develop a fully distributed MapReduce library for Haskell which makes using the MapReduce framework totally transparent for application programmers.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published