Big-Data-Project-SEIS-736

I took SEIS 736, Big Data Architecture, during the Spring 2016 with Dr. Brad Rubin. The course explored big data technologies with an emphasis on Apache Hadoop, an open-source, Java-based framework that facilitates the processing and storage of extremely large data sets in a distributed computing environment. We also explored topics such as information retrieval and computer security, as well as technologies such as Apache Spark, Apache Hive, and MapReduce.

To complete weekly assignments and the project, students had to install a Cloudera-configured Virtual Machine running the CentOS 6.3 Linux distribution onto their own laptop computer. Apache Hadoop was installed on the VM on a single node “pseudo” cluster. We used the University of Saint Thomas’s larger, multi-node cluster to conduct research that could not be solved using our VM.

The final project required that we use Hadoop or a similar technology to analyze a large dataset of our choosing. Dr. Rubin suggested that we choose from one of several online, opensourse datasets to complete the project. My research is based on data from the MovieLens web site (http://movielens.org) collected by a University of Minnesota researchers from GroupLens Research.

I conducted my research primarily using Java and Apache Hadoop, buttressing my findings with sprinkles of SQL and Apache Hive. The code I wrote to conduct my research can be found in the source folder. The "research_and_analysis" file offers a detailed explanation of the source code in its overall analysis of the film dataset. The code is best explored in tandem with the research paper.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
main		main
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big-Data-Project-SEIS-736

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Big-Data-Project-SEIS-736

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages