Skip to content
Snippets Groups Projects
Commit 6bb4cfa9 authored by Meiqi Guo's avatar Meiqi Guo
Browse files

Update README.md

parent d9f27a33
No related branches found
No related tags found
No related merge requests found
......@@ -22,7 +22,7 @@ else if (stopWords.contains(word)){
word.set(token.replaceAll("[^A-Za-z0-9]+", "").toLowerCase())
```
**keep each unique word only once per line**
**Keep each unique word only once per line**
We define a *hashset* where we store words
......@@ -53,8 +53,16 @@ I used two counters:
* the other one is to record the number of lines for the output, named *FinalLineNumCounter*, which means the number after removing all empty lines.
The result is shown as below:
NUM = 124787
Final_NUM = 114815
So nearly 10000 lines are empty.
![](https://gitlab.my.ecp.fr/2014guom/BigDataProcessAssignment2/blob/master/output/counters.PNG)
<img src="https://gitlab.my.ecp.fr/2014guom/BigDataProcessAssignment2/blob/master/output/counters.PNG" width="100px" height="80px" alt="简书">
**Order the tokens of each line in ascending order of global frequency**
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment