MySqlParallelizer

From SemanticLab

Jump to: navigation, search

Contents

Introduction

MySqlParallelizer is a tool that integrates with Apache Hadoop MapReduce Framework to run text analyzing-jobs on cluster environments.

Features

  • JobManagerScript - manages the creation of Hadoop-Jobs
  • MySqlParallelizer - counts words in documents, stores the results to MySqlDatabase, calculates statistical values
  • TargetBuilder - creates views in database for making analysis easier
  • all parts together return the most significant words in a document

Configuration

Please refer to Master Thesis


Sources

You can checkout the complete source (java-src, created with netbeans 6.9) from subversion:

https://svn.semanticlab.net/svn/oss/thesis/MySqlParallelizer/parallelizer/src
https://svn.semanticlab.net/svn/oss/thesis/MySqlParallelizer/targetbuilder/src


The executable jar-files are available at:

https://svn.semanticlab.net/svn/oss/thesis/MySqlParallelizer/jar


Job-Manager Skript can be found at:

https://svn.semanticlab.net/svn/oss/thesis/MySqlParallelizer/
Personal tools