From 388a85d49988192f0bc8c007c9066bc6cfc4ce4c Mon Sep 17 00:00:00 2001 From: Meiqi Guo <mei-qi.guo@student.ecp.fr> Date: Fri, 17 Mar 2017 00:02:22 +0100 Subject: [PATCH] Update README.md --- README.md | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 5d77185..0ed83fe 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,13 @@ # Big Data Process Assignment 2 -I first try this \ No newline at end of file +## Pre-processing the input +For the part of pre-procesing, the input consists of: +* the document corpus of [pg100.txt](https://gitlab.my.ecp.fr/2014guom/BigDataProcessAssignment2/blob/master/input/pg100.txt) +* the [Stopword file](https://gitlab.my.ecp.fr/2014guom/BigDataProcessAssignment2/blob/master/input/Stopwords) which I made in the assignment 1 +* the [Words with frequency file](https://gitlab.my.ecp.fr/2014guom/BigDataProcessAssignment2/blob/master/input/wordfreq) of pg100.txt that I obtained by runnning the assignment +1 with a slight changement of [MyWordCount](https://gitlab.my.ecp.fr/2014guom/BigDataProcessAssignment1/blob/master/MyWordCount.java). + + + + +All the details are written in my code Process.java. + -- GitLab