Skip to content
Snippets Groups Projects
Commit fd406f7a authored by Meiqi Guo's avatar Meiqi Guo
Browse files

Update Report.md

parent b0c798fe
No related branches found
No related tags found
No related merge requests found
...@@ -364,6 +364,7 @@ public void reduce(Text key, Iterable<Text> values, Context context) ...@@ -364,6 +364,7 @@ public void reduce(Text key, Iterable<Text> values, Context context)
} }
} }
``` ```
[The excution time](https://gitlab.my.ecp.fr/2014guom/BigDataProcessAssignment2/blob/master/output/Hadoop_IndexApproach.PNG) is `42seconds`, much less than Naive Approach. [The excution time](https://gitlab.my.ecp.fr/2014guom/BigDataProcessAssignment2/blob/master/output/Hadoop_IndexApproach.PNG) is `42seconds`, much less than Naive Approach.
[Comparaison times](https://gitlab.my.ecp.fr/2014guom/BigDataProcessAssignment2/blob/master/output/counter_IndexApproach.PNG) are 17, much less than Naive Approach. [Comparaison times](https://gitlab.my.ecp.fr/2014guom/BigDataProcessAssignment2/blob/master/output/counter_IndexApproach.PNG) are 17, much less than Naive Approach.
...@@ -374,4 +375,14 @@ You can find the overview of hadoop below: ...@@ -374,4 +375,14 @@ You can find the overview of hadoop below:
See the complete code [here](https://gitlab.my.ecp.fr/2014guom/BigDataProcessAssignment2/blob/master/IndexApproach.java). I didn't commit the output since it's empty for the sample. See the complete code [here](https://gitlab.my.ecp.fr/2014guom/BigDataProcessAssignment2/blob/master/IndexApproach.java). I didn't commit the output since it's empty for the sample.
### Explain and justify the difference
Methods of approach | Excution time | Comparaison times
------------------- | ------------- | -----------------
Naive Approach | 4min 15s | 11476
Index Approach | 42s | 17
We can clearly see that the Index Approach is quicker than the Naive Approach, even on a sample dataset.
This is raisonable because the second method aims at reducing the number of pair comparisions by the inverted index, which allows to skip the (huge) number of comparisons between some non-similar documents.
But the first method takes O(n) computational time and memory, thus needs much more time.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment