MySqlParallelizer
From SemanticLab
Contents |
Introduction
MySqlParallelizer is a tool that integrates with Apache Hadoop MapReduce Framework to run text analyzing-jobs on cluster environments.
Features
- JobManagerScript - manages the creation of Hadoop-Jobs
- MySqlParallelizer - counts words in documents, stores the results to MySqlDatabase, calculates statistical values
- TargetBuilder - creates views in database for making analysis easier
- all parts together return the most significant words in a document
Configuration
Please refer to Master Thesis
Sources
You can checkout the complete source (java-src, created with netbeans 6.9) from subversion:
https://svn.semanticlab.net/svn/oss/thesis/MySqlParallelizer/parallelizer/src https://svn.semanticlab.net/svn/oss/thesis/MySqlParallelizer/targetbuilder/src
The executable jar-files are available at:
https://svn.semanticlab.net/svn/oss/thesis/MySqlParallelizer/jar
Job-Manager Skript can be found at:
https://svn.semanticlab.net/svn/oss/thesis/MySqlParallelizer/

