From fd406f7a882cec5950e26868cba03676340e9a40 Mon Sep 17 00:00:00 2001
From: Meiqi Guo <mei-qi.guo@student.ecp.fr>
Date: Sat, 18 Mar 2017 11:11:06 +0100
Subject: [PATCH] Update Report.md

---
 Report.md | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/Report.md b/Report.md
index d34bb8d..b11226d 100644
--- a/Report.md
+++ b/Report.md
@@ -364,6 +364,7 @@ public void reduce(Text key, Iterable<Text> values, Context context)
     	 }
       }
 ```
+
 [The excution time](https://gitlab.my.ecp.fr/2014guom/BigDataProcessAssignment2/blob/master/output/Hadoop_IndexApproach.PNG) is `42seconds`, much less than Naive Approach.
 
 [Comparaison times](https://gitlab.my.ecp.fr/2014guom/BigDataProcessAssignment2/blob/master/output/counter_IndexApproach.PNG) are 17, much less than Naive Approach. 
@@ -374,4 +375,14 @@ You can find the overview of hadoop below:
 
 See the complete code [here](https://gitlab.my.ecp.fr/2014guom/BigDataProcessAssignment2/blob/master/IndexApproach.java). I didn't commit the output since it's empty for the sample.
 
+### Explain and justify the difference
+Methods of approach | Excution time | Comparaison times
+------------------- | ------------- | -----------------
+Naive Approach      | 4min 15s      | 11476
+Index Approach      | 42s           | 17
+
+We can clearly see that the Index Approach is quicker than the Naive Approach, even on a sample dataset. 
+This is raisonable because the second method aims at reducing the number of pair comparisions by the inverted index, which allows to skip the (huge) number of comparisons between some non-similar documents.
+But the first method takes O(n) computational time and memory, thus needs much more time.
+
 
-- 
GitLab