Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
B
BDPA_Assign2_MMEFTAH
Manage
Activity
Members
Code
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Deploy
Releases
Container registry
Model registry
Analyze
Contributor analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Mohammed Meftah
BDPA_Assign2_MMEFTAH
Repository graph
Repository graph
You can move around the graph by using the arrow keys.
436854b4339b2a4ca606506359a4956c0c8d5415
Select Git revision
Branches
1
master
default
1 result
Begin with the selected commit
Created with Raphaël 2.2.0
19
Mar
18
15
19
Feb
18
13
gitignore
master
master
gitignore
Cleaning all the useless files + gitignore
Efficient Method
clearing files for Qa
cleaning tempory files
Cleaning directories
All pairs preprocessing AND change formatting of preprocessing (we add key)
All changes we take into account capital letters
Taking into acount capital letters after Julien's advice
Preprocessingon the whole pg100.txt
Clearing of Unique_words, not needed
Preprocessing test on pg100_test (5 lines with 1 empty)
resuming preprocessing
Delete WordCount$Reduce.class
Delete WordCount$Map.class
change structuration
remove previous output
start ass2
Q4 invertedIndex with frequencies
Q3 count unique words...
Inverted Index without frequencies- Question2
Q1.iv 50 reduceurs + 1 combiners and compression map output
Q1.iii with BZip2 compression and 10 reducers, Snappy or Gzip are not working...
Merge branch 'master' of https://gitlab.my.ecp.fr/2014meftahm/bpa
data redundance
Delete pg100.txt, pg3200.txt, pg31100.txt
Qa.i et Qa.ii, there is a problem with the combiner we dont retrieve the number of stopwords... this is weird
reorganisation folder tree
full project
Merge branch 'master' of gitlab.my.ecp.fr:2014meftahm/bpa
data files
Add new file
Loading