Update BDPA_Assign2_WJIN.md

2e632454 · Wen Yao Jin · 8048e6bd · 2e632454
Commit 2e632454 authored 8 years ago by Wen Yao Jin
--- a/BDPA_Assign2_WJIN.md
+++ b/BDPA_Assign2_WJIN.md
@@ -400,10 +400,13 @@ The hadoop job overview:
 #### 3 Justification of difference
+The output similar documents can be find [here](similardoc). Remember that we used a sampled file, so there are way less similar docs than it supposed to be. However we can still see that, similar doc is very rare even compared to the sampled file length.
 | Job       | # of comparaison | Execution Time |
 |:----------------:|:----------------:|:--------------:|
 | NaiveApproach               | 365085           | 7m 50s         |
 | PrefilteringApproach               | 976               | 15s         |
 The naive approach takes O(n) computational time and memory, thus needs much more time, even in the shuffle and sort phase. 
 The prefiltering approach is very efficient when similar documents are rare and documents are not very long, which is exactly our case. This explains the drastic performance difference.