Twitter Data Mining

This project's objective was to create a dataset suitable for text summarization. We wanted to analyze stock market news to feed it to trading bots. However, we needed to summarize the text to obtain accurate results, and I was responsible for creating the dataset. To accomplish this, we made a list of tweets from various newspapers and related news on their websites, as most tweets are a condensed version of the report.

We wanted to stay updated, so we listened to streams about specific accounts, but the stream was massive, and managing it did not prove easy. Additionally, we required news about specific companies (e.g., Apple, Google). As a result, we were forced to use named entity recognition techniques, which increased the time complexity and was not preferable given the large stream. As a result of these constraints, we were forced to write our code in a multi-threading fashion.