Thursday, September 30, 2021

Hadoop Lab 2021-22

Date Resource Evaluation Feedback
27/08/2021
03/09/2021
23/09/2021
01/10/2021
Installation Manual
30/09/2021
04/10/2021
21/10/2021 Partitioner
29/10/2021 RecordContents Submit

First phase record submission: Contents

Second phase record submission: Contents

Lab Internal: Paper

30/09/2021 ( 61-92) and 01/10/2021 (1-34)

Word count program code:  Code

07/10/2021 (61-92)

Experiment-1:

Frequent itemset: an itemset X is called frequent for database 𝐷 if and only if it is contained in more than min support many transactions: support(X) >= min support

1. Download the data set from here: groceries.csv 

2. Write a Map-reduce program to find 1-frequent item set with 25% min support 

3. Write a Map-Reduce program to find 2-frequent item set with 23% min support

4. Write a Map-reduce program to find 3-frequent item set with 20% min support

Experiment-2:

1.Download MovieLens data set  

2. Use ratings.csv file , Write a Map-Reduce Program to find the average rating of movies . Where average rating of movie X is defined as  =  (sum of ratings of movie X/ total number of ratings of X)      (not a great formula, come up with your own)

3. Use  movies.csv file, Write a Map-reduce program to find the total number of movies in each genre.

Data source of First question:

Groceries Market Basket Dataset | Kaggle