Welcome to the very first Babel Day. A day in which we speak many programming languages at once.
IT industry is changing continuously. Programming languages popularity is a rollercoaster. Today JavaScript is popular, tomorrow it'll be C#, next week it'll probably be Rust, and then JavaScript again. If you want to survive in this jungle you have to get flexible. This event is aiming to teach you this.
In this repository you can find two text files taken from this repository. These files contain the frequencies of words used in subtitles available on opensubtitles.org. Your task is to process these files and extract some interesting data.
Write a program that finds the following things in both files:
- longest word
- shortest word
- average word length
- number of words (count)
Sample output (fake data):
===pl_full.txt===
longest word: cholerniedługiesłowo
shortest word: jajo
average length: 3
word count: 333
===en_full.txt===
longest word: areallylongword
shortest word: s
average length: 8
word count: 123456A more challenging task. Find words that exist in both files and count them. (Hint) Pay attention to the optimization. Given files are quite big so calculating the common part in the most straightforward way might take way too long.
Sample output (fake data):
common words: ["Codecool", "babel", "day"]
count: 3One hour is a really short time to implement something in a language that you don't know yet. Thus split responsibilities in your team wisely.
Display all words that fall below 20th percentile. Resources: