Stock Market Data Streaming using Kafka and AWS

This project demonstrates a real-time stock market data streaming pipeline using Kafka, AWS EC2, S3, Glue, and other AWS services. The architecture is designed to simulate stock market data, stream it via Kafka, store it in S3, catalog it using AWS Glue, and query it with Amazon Athena.

Architecture Overview

Prerequisites

AWS Account
EC2 Instance with necessary permissions
Apache Kafka
Python 3.x
AWS CLI configured
boto3 library
Dataset for stock market simulation

Setup Instructions

1. Setting up Kafka on EC2

Download and extract Kafka:

wget https://downloads.apache.org/kafka/3.3.1/kafka_2.12-3.7.0.tgz
tar -xvf kafka_2.12-3.7.0.tgz

Install Java (if not already installed):

sudo yum install java-17-openjdk -y
java -version

Start ZooKeeper:

cd kafka_2.12-3.7.0
bin/zookeeper-server-start.sh config/zookeeper.properties

Start Kafka server (in a new terminal):

export KAFKA_HEAP_OPTS="-Xmx256M -Xms128M"
cd kafka_2.12-3.7.0
bin/kafka-server-start.sh config/server.properties

Configure Kafka to use the public IP of your EC2 instance by editing server.properties:

sudo nano config/server.properties
# Change ADVERTISED_LISTENERS to the public IP of the EC2 instance

Create a Kafka topic:

bin/kafka-topics.sh --create --topic demo_testing2 --bootstrap-server {Public_IP_of_EC2_Instance:9092} --replication-factor 1 --partitions 1

Start a Kafka producer:

bin/kafka-console-producer.sh --topic demo_testing2 --bootstrap-server {Public_IP_of_EC2_Instance:9092}

Start a Kafka consumer (in a new terminal):

bin/kafka-console-consumer.sh --topic demo_testing2 --bootstrap-server {Public_IP_of_EC2_Instance:9092}

2. Stock Market Data Simulation

Use the provided Python scripts to simulate stock market data and produce messages to the Kafka topic.

KafkaProducer.ipynb: Contains the producer logic for streaming stock market data.
KafkaConsumer.ipynb: Contains the consumer logic for reading streamed data.

3. Storing Data in S3

Configure your AWS S3 bucket and use Boto3 to store the Kafka data:

Ensure your EC2 instance has the necessary IAM role with S3 permissions.
Use the Boto3 library in your consumer script to upload data to S3.

4. Cataloging Data with AWS Glue

Create a Glue Crawler:
- Set the S3 bucket as the data source.
- Run the crawler to catalog the data.
Use AWS Glue Data Catalog to query and analyze the data with Amazon Athena.

Running the Project

Start the Kafka broker and ZooKeeper as described in the setup instructions.
Run the Kafka producer script to simulate and stream stock market data.
Run the Kafka consumer script to read the streamed data and upload it to S3.
Use AWS Glue and Athena to catalog and query the data.

Conclusion

This project provides a scalable and efficient pipeline for real-time stock market data streaming and analysis using Kafka and AWS services. The architecture leverages the power of distributed systems and cloud computing to handle large volumes of data with ease.

Resources

Contact

For any queries, please reach out to Kartik Pandit at kartikpandit712@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Architecture.jpg		Architecture.jpg
KafkaCOnsumer.ipynb		KafkaCOnsumer.ipynb
KafkaProducer.ipynb		KafkaProducer.ipynb
README.md		README.md
command_kafka.txt		command_kafka.txt
indexProcessed.csv		indexProcessed.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Stock Market Data Streaming using Kafka and AWS

Architecture Overview

Prerequisites

Setup Instructions

1. Setting up Kafka on EC2

2. Stock Market Data Simulation

3. Storing Data in S3

4. Cataloging Data with AWS Glue

Running the Project

Conclusion

Resources

Contact

About

Uh oh!

Releases

Packages

Languages

AmitKumar7138/StockStream

Folders and files

Latest commit

History

Repository files navigation

Stock Market Data Streaming using Kafka and AWS

Architecture Overview

Prerequisites

Setup Instructions

1. Setting up Kafka on EC2

2. Stock Market Data Simulation

3. Storing Data in S3

4. Cataloging Data with AWS Glue

Running the Project

Conclusion

Resources

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages