diff --git a/2026/day-01/README.md b/2026/day-01/README.md deleted file mode 100644 index 118d4e7f4e..0000000000 --- a/2026/day-01/README.md +++ /dev/null @@ -1,99 +0,0 @@ -# Day 01 – Introduction to DevOps and Cloud - -## Task -Today’s goal is to **set the foundation for your DevOps journey**. - -You will create a **90-day personal DevOps learning plan** that clearly defines: -- What is your understanding of DevOps and Cloud Engineering? -- Why you are starting learning DevOps & Cloud? -- Where do you want to reach? -- How you will stay consistent every single day? - -This is not a generic plan. -This is your **career execution blueprint** for the next 90 days. - ---- - -## Expected Output -By the end of today, you should have: - -- A markdown file named: - `learning-plan.md` - -or - -- A hand written plan for the next 90 Days (Recommended) - - -The file/note should clearly reflect your intent, discipline, and seriousness toward becoming a DevOps engineer. - ---- - -## Guidelines -Follow these rules while creating your plan: - -- Mention your **current level** - (student / fresher / working professional / non-IT background, etc.) -- Define **3 clear goals** for the next 90 days - (example: deploy a production-grade application on Kubernetes) -- Define **3 core DevOps skills** you want to build - (example: Linux troubleshooting, CI/CD pipelines, Kubernetes debugging) -- Allocate a **weekly time budget** - (example: 2–2.5 hours per day on weekdays, 4-6 hours weekends) -- Keep the document **under 1 page** -- Be honest and realistic; consistency matters more than perfection - ---- - -## Resources -You may refer to: - -- TrainWithShubham [course curriculum](https://english.trainwithshubham.com/JOSH_BATCH_10_Syllabus_v1.pdf) -- TrainWithShubham DevOps [roadmap](https://docs.google.com/spreadsheets/d/1eE-NhZQFr545LkP4QNhTgXcZTtkMFeEPNyVXAflXia0/edit?gid=2073716385#gid=2073716385) -- Your own past experience and career aspirations - -Avoid over-researching today. The focus is **clarity**, not depth. - ---- - -## Why This Matters for DevOps -DevOps engineers succeed not just because of tools, but because of: - -- Discipline -- Ownership -- Long-term thinking -- Ability to execute consistently - -In real jobs, no one tells you exactly what to do every day. -This task trains you to **take ownership of your own growth**, just like a real DevOps engineer. - -A clear plan: -- Reduces confusion -- Prevents burnout -- Keeps you focused during tough days - ---- - -## Submission -1. Fork this `90DaysOfDevOps` repository -2. Navigate to the `2026/day-01/` folder -3. Add your `learning-plan.md` file -4. Commit and push your changes to your fork - ---- - -## Learn in Public -Share your Day 01 progress on LinkedIn: - -- Post 2–3 lines on why you’re starting **#90DaysOfDevOps** -- Share one goal from your learning plan -- Optional: screenshot of your markdown file or a professional picture - -Use hashtags: -#90DaysOfDevOps -#DevOpsKaJosh -#TrainWithShubham - - -Happy Learning -**TrainWithShubham** \ No newline at end of file diff --git a/2026/day-01/learning_plan.md b/2026/day-01/learning_plan.md new file mode 100644 index 0000000000..357ae471ae --- /dev/null +++ b/2026/day-01/learning_plan.md @@ -0,0 +1,64 @@ +# 90-Day DevOps Learning Plan + +## Current Level +Working professional with DevOps engineering experience. + +--- + +## Understanding of DevOps & Cloud Engineering + +**DevOps** is a culture and practice that bridges development and operations through automation, collaboration, and continuous improvement. It focuses on shortening development cycles, increasing deployment frequency, and ensuring reliable releases through CI/CD, IaC, monitoring, and feedback loops. + +**Cloud Engineering** involves designing, deploying, and managing scalable infrastructure on cloud platforms (AWS, Azure, GCP), leveraging services like compute, storage, networking, and managed solutions to build resilient, cost-effective systems. + +--- + +## Why I'm Learning DevOps & Cloud + +- To stay current with evolving tools and best practices in the DevOps ecosystem +- To build production-grade expertise that delivers measurable business value +- To transition from operational tasks to strategic infrastructure design and optimization +- To strengthen my ability to architect cloud-native solutions at scale + +--- + +## Where I Want to Reach (90-Day Goals) + +1. **Deploy a production-grade microservices application on Kubernetes** with auto-scaling, monitoring, and GitOps-based deployments +2. **Build and maintain a fully automated CI/CD pipeline** using Jenkins/GitLab CI with security scanning, testing, and blue-green deployments +3. **Achieve proficiency in Infrastructure as Code** by managing multi-environment cloud infrastructure using Terraform and Ansible + +--- + +## Core DevOps Skills to Build + +1. **Advanced Kubernetes Management** - Pod troubleshooting, networking, security policies, Helm charts, and cluster optimization +2. **CI/CD Pipeline Mastery** - Building robust pipelines with automated testing, security gates, artifact management, and deployment strategies +3. **Observability & Incident Response** - Setting up comprehensive monitoring, logging, alerting with Prometheus/Grafana/ELK, and conducting RCA + +--- + +## Weekly Time Budget + +- **Weekdays (Mon-Fri):** 2-2.5 hours daily (hands-on labs, documentation, learning) +- **Weekends (Sat-Sun):** 4-6 hours per day (projects, debugging, writing blog posts) +- **Total weekly commitment:** 18-24 hours + +--- + +## Consistency Strategy + +- **Daily standup with myself:** 10-minute review each morning of what I'll learn today +- **Public accountability:** Share progress on LinkedIn/GitHub weekly +- **Hands-on first:** Build, break, fix - no passive watching without implementation +- **Document everything:** Maintain daily notes in a GitHub repo to track learnings and blockers +- **No zero days:** Even 30 minutes counts; consistency beats intensity +- **Weekend projects:** Apply weekly learnings to real-world scenarios every Saturday/Sunday + +--- + +**Start Date:** [Today's Date] +**End Date:** [90 Days from Today] +**Tracking:** GitHub repository with daily commit streak + +*This is my commitment to myself. 90 days of focused execution.* diff --git a/2026/day-02/linux-architecture-notes.md b/2026/day-02/linux-architecture-notes.md new file mode 100644 index 0000000000..98d8e37497 --- /dev/null +++ b/2026/day-02/linux-architecture-notes.md @@ -0,0 +1,292 @@ +# Day 02 – Linux Architecture, Processes, and systemd + +## What We'll Learn Today +Understanding how Linux works is like learning how a car engine works before you start driving. It helps you fix problems faster and make better decisions as a DevOps engineer! + +--- + +## Linux Architecture: The Big Picture + +Think of Linux as a well-organized building with different floors: + +``` +┌─────────────────────────────────────┐ +│ Applications & User Programs │ (What you interact with) +│ (Firefox, Docker, VS Code, etc.) │ +├─────────────────────────────────────┤ +│ User Space │ (Where programs run) +│ (Libraries, System Tools) │ +├─────────────────────────────────────┤ +│ System Calls (Interface) │ (Communication bridge) +├─────────────────────────────────────┤ +│ Linux Kernel │ (The brain of the system) +│ (Process, Memory, Device Manager) │ +├─────────────────────────────────────┤ +│ Hardware │ (Physical components) +│ (CPU, RAM, Disk, Network) │ +└─────────────────────────────────────┘ +``` + +--- + +## Core Components of Linux + +### 1️The Linux Kernel (The Brain) + +**What is it?** +The kernel is the core of the operating system. It's like the manager of a restaurant who coordinates everything. + +**What does it do?** +- **Process Management**: Decides which program gets to use the CPU and when +- **Memory Management**: Allocates RAM to programs and makes sure they don't interfere with each other +- **Device Management**: Talks to your hardware (keyboard, mouse, disk, network card) +- **File System Management**: Organizes how files are stored and retrieved +- **Security**: Controls who can access what + +**Simple Analogy:** +Think of the kernel as a traffic controller at a busy intersection, making sure all cars (programs) move smoothly without crashing into each other. + +--- + +### 2️User Space (Where We Live) + +**What is it?** +User space is where all your applications and programs run. This is separated from the kernel for safety. + +**Why the separation?** +If a program crashes in user space, it won't bring down the entire system. The kernel stays protected! + +**What lives here?** +- Applications (browsers, text editors, Docker) +- System utilities (ls, cat, grep) +- Libraries (code that programs share) +- Your shell (bash, zsh) + +**Simple Analogy:** +User space is like the dining area of a restaurant. Customers (programs) can eat here, but they can't go into the kitchen (kernel) and mess with the stove! + +--- + +### 3️Init System / systemd (The Startup Manager) + +**What is Init?** +Init is the **first process** that starts when Linux boots up. It's like the opening manager of a store who turns on all the lights and gets everything ready. + +**What is systemd?** +systemd is the modern init system used by most Linux distributions today. It replaced the older "SysV init" system. + +**Why does it matter?** +- Starts and stops services (like web servers, databases) +- Manages dependencies (starts things in the right order) +- Monitors services and restarts them if they crash +- Handles system logging +- Much faster boot times than old init systems + +**Simple Analogy:** +systemd is like a stage manager at a theater. It makes sure all actors (services) come on stage at the right time, in the right order, and if someone misses their cue, it gets them back on stage! + +--- + +## How Processes Work in Linux + +### What is a Process? + +A process is simply a **program that's running**. When you double-click an app, you create a process. + +**Key Points:** +- Every process has a unique ID called **PID** (Process ID) +- The first process (systemd) has PID 1 +- Every process (except PID 1) has a parent process (PPID) + +### Process Lifecycle + +``` +1. Creation (Fork) + ↓ +2. Execution (Exec) + ↓ +3. Running + ↓ +4. Waiting/Sleeping (if needed) + ↓ +5. Termination +``` + +### How Are Processes Created? + +Linux creates new processes using two system calls: + +**1. fork()** – Makes a copy of the current process +- The parent process creates a child process +- The child is almost identical to the parent + +**2. exec()** – Replaces the child process with a new program +- After forking, the child calls exec() to run a different program + +**Simple Example:** +``` +When you type "ls" in your terminal: +1. Your shell (bash) forks itself +2. The child process calls exec(ls) +3. Now the child is running "ls" command +4. When done, the child exits +5. The parent (bash) continues +``` + +--- + +## Process States + +A process can be in different states: + +| State | Symbol | What It Means | +|-------|--------|---------------| +| **Running** | R | Currently using the CPU | +| **Sleeping** | S | Waiting for something (like user input) | +| **Stopped** | T | Paused (you can resume it) | +| **Zombie** | Z | Finished but parent hasn't collected it yet | +| **Dead** | X | Completely terminated | + +**Check process states:** +```bash +ps aux +``` + +--- + +## systemd Deep Dive + +### Why systemd Matters for DevOps + +As a DevOps engineer, you'll use systemd **every single day** to: +- Start/stop services (nginx, docker, databases) +- Check service status +- View logs +- Set services to start on boot +- Troubleshoot why services failed + +### Key systemd Concepts + +**1. Units** +Everything in systemd is a "unit". Types include: +- `.service` – Services (nginx, docker) +- `.socket` – Network sockets +- `.timer` – Scheduled tasks (like cron) +- `.mount` – File systems +- `.target` – Groups of units + +**2. Unit Files** +Configuration files that describe how to manage a service. + +**Location:** +``` +/etc/systemd/system/ (custom services) +/lib/systemd/system/ (system services) +``` + +### Essential systemd Commands + +```bash +# Start a service +sudo systemctl start nginx + +# Stop a service +sudo systemctl stop nginx + +# Restart a service +sudo systemctl restart nginx + +# Check status +sudo systemctl status nginx + +# Enable (start on boot) +sudo systemctl enable nginx + +# Disable (don't start on boot) +sudo systemctl disable nginx + +# View logs for a service +sudo journalctl -u nginx + +# View real-time logs +sudo journalctl -u nginx -f + +# List all running services +systemctl list-units --type=service --state=running + +# Check if a service failed +systemctl is-failed nginx +``` + +--- + +## Practical Examples for DevOps + +### Example 1: Check if Docker is Running + +```bash +systemctl status docker +``` + +If it's not running: +```bash +sudo systemctl start docker +sudo systemctl enable docker # Start on boot +``` +--- + +### Example 2: View Service Logs + +```bash +# Last 100 lines +journalctl -u docker -n 100 + +# Real-time logs (like tail -f) +journalctl -u docker -f + +# Logs from today +journalctl -u docker --since today +``` + +## Key Takeaways for DevOps Engineers + +1. **Linux has layers**: Hardware → Kernel → User Space → Applications +2. **The kernel manages everything**: processes, memory, devices, files +3. **Processes are created using fork() and exec()** +4. **systemd is your service manager**: Start, stop, monitor, and troubleshoot services +5. **Learn systemctl and journalctl**: These are your daily tools for managing services + + +--- + +## Quick Reference Cheat Sheet + +### Process Commands +```bash +ps aux # List all processes +ps -ef # Another format +top # Real-time process viewer +htop # Better top (needs installation) +kill # Stop a process +kill -9 # Force kill +pgrep # Find PID by name +pkill # Kill by name +``` + +### systemd Commands +```bash +systemctl start +systemctl stop +systemctl restart +systemctl status +systemctl enable +systemctl disable +systemctl list-units --type=service +journalctl -u +journalctl -u -f +systemctl daemon-reload +``` + +--- + +*The best way to learn is by doing it.* \ No newline at end of file diff --git a/2026/day-03/linux-commands-cheatsheet.md b/2026/day-03/linux-commands-cheatsheet.md new file mode 100644 index 0000000000..85820fc0e9 --- /dev/null +++ b/2026/day-03/linux-commands-cheatsheet.md @@ -0,0 +1,280 @@ +# Day-03: Linux Commands Cheat Sheet + +## Process Management + +### ps - Process Status +```bash +ps aux # List all processes with detailed info (a=all users, u=user-oriented, x=include non-terminal) +ps -ef # Full format listing (alternative to aux) +ps aux | grep nginx # Find specific process +ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%mem | head # Show top memory consumers +``` + +### top/htop - Real-time Process Monitoring +```bash +top # Interactive process viewer +top -u username # Show processes for specific user +htop # Enhanced interactive process viewer (if installed) +``` + +### kill - Terminate Processes +```bash +kill -9 PID # Force kill process (SIGKILL) +kill -15 PID # Graceful termination (SIGTERM) - default +killall nginx # Kill all processes by name +pkill -f "pattern" # Kill processes matching pattern +``` + +### systemctl - Service Management +```bash +systemctl status nginx # Check service status +systemctl start nginx # Start service +systemctl stop nginx # Stop service +systemctl restart nginx # Restart service +systemctl reload nginx # Reload configuration without restart +systemctl enable nginx # Enable service on boot +systemctl disable nginx # Disable service on boot +systemctl list-units --type=service --state=running # List running services +``` + +### journalctl - System Logs +```bash +journalctl -u nginx # Show logs for specific service +journalctl -f # Follow logs in real-time +journalctl -n 50 # Show last 50 lines +journalctl --since "1 hour ago" # Logs from last hour +journalctl -p err # Show only error-level messages +journalctl -xe # Show recent logs with explanation +``` + +### nohup & bg/fg - Background Processes +```bash +nohup command & # Run command immune to hangups +jobs # List background jobs +bg %1 # Resume job 1 in background +fg %1 # Bring job 1 to foreground +``` + +--- + +## File System + +### ls - List Directory Contents +```bash +ls -la # Long format with hidden files (l=long, a=all) +ls -lh # Human-readable file sizes +ls -lt # Sort by modification time +ls -ltr # Sort by time, reverse (oldest first) +ls -lS # Sort by file size +``` + +### find - Search for Files +```bash +find /var/log -name "*.log" # Find files by name +find /home -type f -mtime -7 # Files modified in last 7 days +find /tmp -type f -size +100M # Files larger than 100MB +find /var -name "*.log" -mtime +30 -delete # Delete old log files +find . -type f -exec chmod 644 {} \; # Execute command on found files +``` + +### du - Disk Usage +```bash +du -sh * # Summary of each item in current directory (s=summary, h=human-readable) +du -sh /var/log # Total size of directory +du -h --max-depth=1 # Size of subdirectories, 1 level deep +du -ah | sort -rh | head -20 # Top 20 largest files/directories +``` + +### df - Disk Free Space +```bash +df -h # Human-readable disk space +df -i # Show inode usage +df -hT # Include filesystem type +``` + +### tar - Archive Files +```bash +tar -czf archive.tar.gz /path/to/dir # Create compressed archive (c=create, z=gzip, f=file) +tar -xzf archive.tar.gz # Extract compressed archive (x=extract) +tar -tzf archive.tar.gz # List contents without extracting (t=list) +tar -xzf archive.tar.gz -C /dest/path # Extract to specific directory +``` + +### grep - Search Text +```bash +grep -r "error" /var/log # Recursive search (r=recursive) +grep -i "error" file.log # Case-insensitive search +grep -n "error" file.log # Show line numbers +grep -v "info" file.log # Invert match (exclude lines) +grep -A 5 "error" file.log # Show 5 lines after match +grep -B 5 "error" file.log # Show 5 lines before match +grep -C 5 "error" file.log # Show 5 lines before and after +``` + +### chmod/chown - Permissions +```bash +chmod 755 script.sh # rwxr-xr-x permissions +chmod +x script.sh # Add execute permission +chmod -R 644 /var/www # Recursive permission change +chown user:group file # Change owner and group +chown -R www-data:www-data /var/www # Recursive ownership change +``` + +### ln - Create Links +```bash +ln -s /path/to/file link # Create symbolic link (s=symbolic) +ln file hardlink # Create hard link +``` + +--- + +## Networking & Troubleshooting + +### netstat - Network Statistics (legacy) +```bash +netstat -tuln # List listening ports (t=TCP, u=UDP, l=listening, n=numeric) +netstat -plant # Show process using ports (requires root, p=program, a=all) +netstat -r # Show routing table +``` + +### ss - Socket Statistics (modern alternative to netstat) +```bash +ss -tuln # List listening TCP/UDP ports +ss -tulpn # Include process information +ss -s # Show summary statistics +ss -o state established # Show established connections with timer info +``` + +### curl - Transfer Data +```bash +curl -I https://example.com # Fetch headers only (I=head) +curl -o file.txt https://example.com/file # Save to file (o=output) +curl -L https://example.com # Follow redirects (L=location) +curl -X POST -d "data" https://api.example.com # POST request +curl -v https://example.com # Verbose output +curl -k https://example.com # Ignore SSL certificate errors +``` + +### wget - Download Files +```bash +wget https://example.com/file.zip # Download file +wget -c https://example.com/file.zip # Continue interrupted download +wget -r -np -k https://example.com # Mirror website (r=recursive, np=no parent, k=convert links) +``` + +### ping - Test Connectivity +```bash +ping -c 4 google.com # Send 4 packets (c=count) +ping -i 2 google.com # 2 second interval between packets +``` + +### traceroute - Trace Network Path +```bash +traceroute google.com # Show route packets take +traceroute -n google.com # Don't resolve hostnames (faster) +``` + +### dig - DNS Lookup +```bash +dig example.com # Query DNS +dig example.com +short # Brief output +dig @8.8.8.8 example.com # Use specific DNS server +dig example.com MX # Query mail servers +dig -x 8.8.8.8 # Reverse DNS lookup +``` + +### nslookup - DNS Query (alternative) +```bash +nslookup example.com # Simple DNS lookup +nslookup example.com 8.8.8.8 # Use specific DNS server +``` + +### tcpdump - Packet Analyzer +```bash +tcpdump -i eth0 # Capture on interface eth0 +tcpdump -i eth0 port 80 # Capture HTTP traffic +tcpdump -i eth0 -w capture.pcap # Write to file +tcpdump -i eth0 host 192.168.1.1 # Capture traffic to/from specific host +tcpdump -i eth0 -n # Don't resolve hostnames +``` + +### iptables - Firewall Rules +```bash +iptables -L -n -v # List all rules (L=list, n=numeric, v=verbose) +iptables -A INPUT -p tcp --dport 80 -j ACCEPT # Allow HTTP +iptables -D INPUT 3 # Delete rule 3 from INPUT chain +iptables -F # Flush all rules (careful!) +``` + +### nc (netcat) - Network Swiss Army Knife +```bash +nc -zv host 80 # Test if port is open (z=scan, v=verbose) +nc -l 8080 # Listen on port 8080 +nc host 8080 < file.txt # Send file over network +``` + +### lsof - List Open Files +```bash +lsof -i :80 # Show what's using port 80 +lsof -i TCP:1-1024 # Show processes using ports 1-1024 +lsof -u username # Show files opened by user +lsof -c nginx # Show files opened by nginx +lsof -p PID # Show files opened by specific process +``` + +### ip - Network Configuration (modern alternative to ifconfig) +```bash +ip addr show # Show IP addresses +ip link show # Show network interfaces +ip route show # Show routing table +ip -s link # Show interface statistics +ip addr add 192.168.1.100/24 dev eth0 # Add IP address +``` + +--- + +## Quick Troubleshooting Workflows + +### High CPU Usage +```bash +top +ps aux --sort=-%cpu | head -10 +``` + +### High Memory Usage +```bash +free -h +ps aux --sort=-%mem | head -10 +``` + +### Disk Space Issues +```bash +df -h +du -sh /* | sort -rh | head -10 +find /var/log -type f -size +100M +``` + +### Network Connectivity Issues +```bash +ping -c 4 8.8.8.8 +traceroute google.com +dig example.com +curl -I https://example.com +``` + +### Port Troubleshooting +```bash +ss -tulpn | grep :80 +lsof -i :80 +netstat -tulpn | grep :80 +``` + +### Service Not Starting +```bash +systemctl status service-name +journalctl -u service-name -n 50 +journalctl -xe +``` + +--- + diff --git a/2026/day-04/linux-practice.md b/2026/day-04/linux-practice.md new file mode 100644 index 0000000000..473063e995 --- /dev/null +++ b/2026/day-04/linux-practice.md @@ -0,0 +1,544 @@ +# Day 04 – Linux Practice: Processes and Services + +## Today's Mission +Practice Linux fundamentals by running actual commands and understanding what they show you. This is **hands-on learning** - no theory, just doing! + +--- + +## Part 1: Process Checks + +### Command 1: `ps aux` - List All Running Processes + +**What I ran:** +```bash +ps aux | head -20 +``` + +**What it does:** +- `ps aux` = Show all processes from all users +- `a` = Show processes for all users +- `u` = Display user-oriented format +- `x` = Include processes without a terminal + +**Real Output:** +``` +USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND +root 1 13.7 0.1 352336 17552 ? Ssl 18:39 0:00 /process_api +root 11 25.0 0.0 10848 2988 ? S 18:39 0:00 /bin/sh +root 12 66.6 0.0 15996 8132 ? R 18:39 0:00 ps aux +``` + +**What I learned:** +- **PID** = Process ID (unique identifier) +- **%CPU** = CPU usage percentage +- **%MEM** = Memory usage percentage +- **STAT** = Process state (R=Running, S=Sleeping, Z=Zombie) +- **TIME** = Total CPU time used +- PID 1 is always the init process (systemd on most systems) + +--- + +### Command 2: `ps -ef` - Show Parent-Child Relationships + +**What I ran:** +```bash +ps -ef | head -15 +``` + +**What it does:** +- Shows every process with PPID (Parent Process ID) +- Helps understand which process started which + +**Real Output:** +``` +UID PID PPID C STIME TTY TIME CMD +root 1 0 1 18:39 ? 00:00:00 /process_api +root 27 1 33 18:40 ? 00:00:00 /bin/sh -c ps -ef +root 28 27 50 18:40 ? 00:00:00 ps -ef +``` + +**What I learned:** +- **PPID** = Parent Process ID +- Process 1 has PPID 0 (it's the root of the process tree) +- Process 28 (ps) was started by process 27 (shell) +- Everything traces back to PID 1 + +--- + +### Command 3: `pgrep` - Find Process by Name + +**What I ran:** +```bash +pgrep -l ssh +pgrep -l cron +pgrep -l process +``` + +**What it does:** +- Searches for processes by name +- `-l` flag shows both PID and name + +**Real Output:** +``` +1 process_api +``` + +**What I learned:** +- Much easier than using `ps aux | grep` +- Returns just the PID (useful for scripting) +- With `-l` flag, also shows the process name + +**Useful variations:** +```bash +pgrep -u root # Find processes owned by root +pgrep -c nginx # Count how many nginx processes +pkill -9 nginx # Kill all nginx processes (dangerous!) +``` + +--- + +### Command 4: `top` - Real-Time Process Monitor + +**What I ran:** +```bash +top -b -n 1 | head -20 +``` + +**What it does:** +- Shows processes in real-time (like Task Manager on Windows) +- `-b` = Batch mode (for capturing output) +- `-n 1` = Run only 1 iteration + +**Real Output:** +``` +top - 18:40:12 up 0 min, 0 user, load average: 0.00, 0.00, 0.00 +Tasks: 4 total, 1 running, 3 sleeping, 0 stopped, 0 zombie +%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni, 100.0 id, 0.0 wa, 0.0 hi +MiB Mem : 9216.0 total, 9186.8 free, 29.2 used +MiB Swap: 0.0 total, 0.0 free, 0.0 used + + PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND + 1 root 20 0 352868 18460 0 S 30.0 0.2 0:00.41 process_a +``` + +**What I learned:** +- **Load average**: System load over last 1, 5, 15 minutes +- **Tasks**: Total processes and their states +- **%Cpu(s)**: us=user, sy=system, id=idle, wa=waiting for I/O +- **Memory**: Total, free, used RAM +- Press `q` to quit top in interactive mode +- Press `h` for help in interactive mode + +**Pro Tips:** +```bash +top # Interactive mode +htop # Better visual interface (needs install) +top -u username # Show processes for specific user +``` + +--- + +## Part 2: Service Checks (systemd) + +### Command 5: `systemctl status` - Check Service Status + +**What I ran:** +```bash +systemctl status ssh +systemctl status docker +systemctl status nginx +``` + +**Example Output (SSH service):** +``` +● ssh.service - OpenBSD Secure Shell server + Loaded: loaded (/lib/systemd/system/ssh.service; enabled; vendor preset: enabled) + Active: active (running) since Wed 2026-01-28 10:15:23 UTC; 2h 15min ago + Docs: man:sshd(8) + man:sshd_config(5) + Main PID: 1234 (sshd) + Tasks: 1 (limit: 4915) + Memory: 2.5M + CPU: 45ms + CGroup: /system.slice/ssh.service + └─1234 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups + +Jan 28 10:15:23 server systemd[1]: Starting OpenBSD Secure Shell server... +Jan 28 10:15:23 server sshd[1234]: Server listening on 0.0.0.0 port 22. +Jan 28 10:15:23 server systemd[1]: Started OpenBSD Secure Shell server. +``` + +**What I learned:** +- **Loaded**: Where the service file is located and if it's enabled +- **Active**: Current state (running, stopped, failed) +- **Main PID**: The main process ID for this service +- **Tasks**: Number of tasks/threads +- **Memory/CPU**: Resource usage +- **CGroup**: Process hierarchy +- Recent log entries shown at bottom + +--- + +### Command 6: `systemctl list-units` - List All Services + +**What I ran:** +```bash +systemctl list-units --type=service --state=running +systemctl list-units --type=service --state=failed +systemctl list-units --type=service +``` + +**Example Output (Running Services):** +``` +UNIT LOAD ACTIVE SUB DESCRIPTION +accounts-daemon.service loaded active running Accounts Service +cron.service loaded active running Regular background program +dbus.service loaded active running D-Bus System Message Bus +docker.service loaded active running Docker Application Container +networkd-dispatcher.service loaded active running Dispatcher daemon for systemd +ssh.service loaded active running OpenBSD Secure Shell server +systemd-journald.service loaded active running Journal Service +systemd-logind.service loaded active running Login Service +``` + +**What I learned:** +- `LOAD` = Whether systemd loaded the unit file +- `ACTIVE` = Current state +- `SUB` = More detailed state +- Easy way to see what's running on your system +- Can filter by state (running, failed, dead) + +**Useful commands:** +```bash +systemctl list-units --type=service --state=failed # Find broken services +systemctl list-units --all # Show everything +systemctl list-unit-files --type=service # All service files +``` + +--- + +### Command 7: Service Management Commands + +**What I ran (for practice):** +```bash +# Check if a service is enabled (starts on boot) +systemctl is-enabled ssh + +# Check if a service is active +systemctl is-active ssh + +# Enable a service (start on boot) +sudo systemctl enable docker + +# Disable a service +sudo systemctl disable nginx + +# Start a service +sudo systemctl start nginx + +# Stop a service +sudo systemctl stop nginx + +# Restart a service +sudo systemctl restart nginx + +# Reload configuration without restarting +sudo systemctl reload nginx +``` + +**What I learned:** +- `enable/disable` = Controls boot behavior (doesn't start/stop now) +- `start/stop` = Controls current state (doesn't affect boot) +- `restart` = Stop then start (brief downtime) +- `reload` = Reload config without downtime (if supported) +- Need `sudo` for most service management commands + +--- + +## Part 3: Log Checks + +### Command 8: `journalctl` - View System Logs + +**What I ran:** +```bash +# View logs for SSH service +journalctl -u ssh + +# View last 50 lines +journalctl -u ssh -n 50 + +# Follow logs in real-time (like tail -f) +journalctl -u ssh -f + +# View logs since today +journalctl -u ssh --since today + +# View logs from specific time +journalctl -u ssh --since "2026-01-28 10:00:00" + +# View logs between times +journalctl -u ssh --since "10:00" --until "11:00" +``` + +**Example Output:** +``` +Jan 28 10:15:23 server systemd[1]: Starting OpenBSD Secure Shell server... +Jan 28 10:15:23 server sshd[1234]: Server listening on 0.0.0.0 port 22. +Jan 28 10:15:23 server sshd[1234]: Server listening on :: port 22. +Jan 28 10:15:23 server systemd[1]: Started OpenBSD Secure Shell server. +Jan 28 12:30:45 server sshd[5678]: Accepted publickey for admin from 192.168.1.100 +Jan 28 12:30:45 server sshd[5678]: pam_unix(sshd:session): session opened for user admin +``` + +**What I learned:** +- `journalctl` is systemd's log viewer +- `-u` flag specifies which service +- `-n` shows last N lines +- `-f` follows logs in real-time (Ctrl+C to stop) +- `--since` and `--until` for time filtering +- Logs are stored in binary format (not plain text files) + +**Useful options:** +```bash +journalctl -p err # Only errors +journalctl -p warning # Warnings and above +journalctl -b # Logs since last boot +journalctl -b -1 # Logs from previous boot +journalctl --disk-usage # How much space logs use +journalctl --vacuum-time=2weeks # Clean old logs +``` + +--- + +### Command 9: `tail` - View Traditional Log Files + +**What I ran:** +```bash +# View last 50 lines of auth log +sudo tail -n 50 /var/log/auth.log + +# Follow log file in real-time +sudo tail -f /var/log/syslog + +# View last 100 lines of nginx access log +sudo tail -n 100 /var/log/nginx/access.log + +# View last 50 lines of nginx error log +sudo tail -n 50 /var/log/nginx/error.log +``` + +**Example Output (auth.log):** +``` +Jan 28 12:30:45 server sshd[5678]: Accepted publickey for admin from 192.168.1.100 +Jan 28 12:30:45 server sshd[5678]: pam_unix(sshd:session): session opened for user admin +Jan 28 12:45:12 server sudo: admin : TTY=pts/0 ; PWD=/home/admin ; USER=root ; COMMAND=/bin/systemctl status nginx +Jan 28 12:45:12 server sudo: pam_unix(sudo:session): session opened for user root +``` + +**What I learned:** +- Traditional log files are in `/var/log/` +- `tail -f` is invaluable for watching logs live +- Different services have different log files +- Need `sudo` to read most log files +- Press Ctrl+C to stop following + +**Common log locations:** +``` +/var/log/syslog # General system log +/var/log/auth.log # Authentication logs +/var/log/kern.log # Kernel logs +/var/log/dmesg # Boot messages +/var/log/nginx/ # Nginx logs +/var/log/apache2/ # Apache logs +/var/log/mysql/ # MySQL logs +``` + +--- + +## Part 4: Mini Troubleshooting Example + +### Scenario: Docker Service Won't Start + +**Step 1: Check if Docker is running** +```bash +systemctl status docker +``` + +**Output shows:** +``` +● docker.service - Docker Application Container Engine + Loaded: loaded (/lib/systemd/system/docker.service; enabled) + Active: failed (Result: exit-code) since Wed 2026-01-28 14:30:12 UTC + Process: 3456 ExecStart=/usr/bin/dockerd (code=exited, status=1/FAILURE) +``` + +**Problem identified:** Service is `failed` + +--- + +**Step 2: Check the logs** +```bash +journalctl -u docker -n 50 +``` + +**Output shows:** +``` +Jan 28 14:30:12 server dockerd[3456]: failed to start daemon: error initializing graphdriver: +Jan 28 14:30:12 server dockerd[3456]: driver not supported +Jan 28 14:30:12 server systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE +Jan 28 14:30:12 server systemd[1]: docker.service: Failed with result 'exit-code'. +``` + +**Problem identified:** Issue with storage driver + +--- + +**Step 3: Check Docker process** +```bash +pgrep -l docker +``` + +**Output:** +``` +(no output - process not running) +``` + +--- + +**Step 4: Check configuration file** +```bash +sudo cat /etc/docker/daemon.json +``` + +**Found the issue:** Misconfigured storage driver + +--- + +**Step 5: Fix the configuration** +```bash +sudo nano /etc/docker/daemon.json +# Fix the config +``` + +--- + +**Step 6: Reload and restart** +```bash +sudo systemctl daemon-reload +sudo systemctl start docker +``` + +--- + +**Step 7: Verify it's working** +```bash +systemctl status docker +``` + +**Output:** +``` +● docker.service - Docker Application Container Engine + Loaded: loaded (/lib/systemd/system/docker.service; enabled) + Active: active (running) since Wed 2026-01-28 14:35:45 UTC +``` + +**Success!** + +--- + +## What I Learned Today + +### Process Management +1. `ps aux` shows all processes with resource usage +2. `ps -ef` shows parent-child relationships (PPID) +3. `pgrep` is the easiest way to find processes by name +4. `top` gives real-time view of system resources + +### Service Management +5. `systemctl status` shows detailed service information +6. `systemctl list-units` shows all running services +7. Always use `sudo` for starting/stopping services +8. `enable` ≠ `start` (boot vs now) + +### Log Investigation +9. `journalctl -u ` for systemd service logs +10. `tail -f` for real-time log monitoring +11. Logs are your best friend when troubleshooting + +### Troubleshooting Workflow +1. Check status first (`systemctl status`) +2. Read the logs (`journalctl -u`) +3. Look for error messages +4. Check configuration files +5. Make changes +6. Reload and restart +7. Verify it worked + +--- + +## My Personal Cheat Sheet + +### Quick Process Checks +```bash +ps aux | grep # Find specific process +pgrep -l # Simpler way to find process +top # Interactive process viewer +kill # Stop a process +kill -9 # Force kill (last resort) +``` + +### Quick Service Checks +```bash +systemctl status # Check service status +systemctl start # Start service +systemctl stop # Stop service +systemctl restart # Restart service +systemctl enable # Start on boot +systemctl disable # Don't start on boot +``` + +### Quick Log Checks +```bash +journalctl -u # View service logs +journalctl -u -f # Follow logs live +journalctl -u -n 50 # Last 50 lines +tail -f /var/log/syslog # Follow system log +tail -n 100 /var/log/auth.log # Last 100 auth events +``` + +--- + + +## Additional Commands I Want to Remember + +```bash +# See all failed services +systemctl --failed + +# See service dependencies +systemctl list-dependencies + +# Show service configuration +systemctl show + +# Check system boot time +systemd-analyze + +# Find which services slow down boot +systemd-analyze blame + +# Watch logs for multiple services +journalctl -u service1 -u service2 -f + +# Get help on any command +man ps +man systemctl +man journalctl +``` + +--- + +**Date Practices:** January 28, 2026 +**Time Spent:** 2 hours +**Feeling:** More confident with Linux fundamentals. + diff --git a/2026/day-05/linux-troubleshooting-runbook.md b/2026/day-05/linux-troubleshooting-runbook.md new file mode 100644 index 0000000000..052c0ca2e1 --- /dev/null +++ b/2026/day-05/linux-troubleshooting-runbook.md @@ -0,0 +1,346 @@ +# Linux Troubleshooting Runbook: Process Health Check + +--- + +## Target Service Overview + +**Service Name:** process_api +**Process ID:** 1 +**Command:** `/process_api --addr 0.0.0.0:2024 --max-ws-buffer-size 32768 --cpu-shares 1024 --oom-polling-period-ms 100 --memory-limit-bytes 4294967296 --block-local-connections` +**Port:** 2024 +**Purpose:** Process management API service + +--- + +## Environment Basics + +### Command 1: System Information +```bash +uname -a +``` +**Output:** +``` +Linux runsc 4.4.0 #1 SMP Sun Jan 10 15:06:54 PST 2016 x86_64 x86_64 x86_64 GNU/Linux +``` +**Observation:** Running on Linux kernel 4.4.0, x86_64 architecture. System appears to be containerized (runsc - gVisor runtime). + +--- + +### Command 2: OS Version Check +```bash +cat /etc/os-release +``` +**Output:** +``` +PRETTY_NAME="Ubuntu 24.04.3 LTS" +VERSION_ID="24.04" +VERSION_CODENAME=noble +``` +**Observation:** Ubuntu 24.04.3 LTS (Noble Numbat) - Long Term Support version, current and stable. + +--- + +## Filesystem Sanity Checks + +### Command 3: Test Directory Creation & File Operations +```bash +mkdir -p /tmp/runbook-demo && \ +echo "Runbook test file created at $(date)" > /tmp/runbook-demo/test.txt && \ +cp /etc/hosts /tmp/runbook-demo/hosts-copy && \ +ls -lh /tmp/runbook-demo/ +``` +**Output:** +``` +total 1.0K +-rwxr-xr-x 1 root root 98 Jan 31 19:41 hosts-copy +-rw-r--r-- 1 root root 58 Jan 31 19:41 test.txt +``` +**Observation:** Filesystem write operations working correctly. No permission issues. Files created successfully with proper timestamps. + +--- + +### Command 4: Verify File Content +```bash +cat /tmp/runbook-demo/test.txt +``` +**Output:** +``` +Runbook test file created at Sat Jan 31 19:41:33 UTC 2026 +``` +**Observation:** File content persisted correctly. No data corruption detected. + +--- + +## CPU & Memory Snapshot + +### Command 5: Process-Specific Resource Usage +```bash +ps -o pid,pcpu,pmem,comm -p 1 +``` +**Output:** +``` + PID %CPU %MEM COMMAND + 1 2.5 0.1 process_api +``` +**Observation:** Process is consuming 2.5% CPU and 0.1% memory - normal idle state. No CPU spikes observed. + +--- + +### Command 6: System-Wide Memory Usage +```bash +free -h +``` +**Output:** +``` + total used free shared buff/cache available +Mem: 9.0Gi 13Mi 9.0Gi 0B 8.2Mi 9.0Gi +Swap: 0B 0B 0B +``` +**Observation:** Excellent memory health. Only 13MB used out of 9GB. No swap configured (containerized environment). 99% memory available. + +--- + +### Command 7: Top Process Overview +```bash +top -b -n 1 | head -20 +``` +**Output:** +``` +top - 19:41:52 up 0 min, 0 user, load average: 0.00, 0.00, 0.00 +Tasks: 4 total, 1 running, 3 sleeping, 0 stopped, 0 zombie +%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni, 100.0 id, 0.0 wa, 0.0 hi, 0.0 si +MiB Mem : 9216.0 total, 9200.0 free, 16.0 used, 8.6 buff/cache + + PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND + 1 root 20 0 487844 22916 0 S 10.0 0.2 0:00.72 process_api +``` +**Observation:** System just booted (0 min uptime). Load average 0.00 indicates no stress. CPU 100% idle. Only 4 total tasks running - minimal overhead. + +--- + +### Command 8: System Load & Uptime +```bash +uptime +``` +**Output:** +``` +19:42:33 up 1 min, 0 user, load average: 0.00, 0.00, 0.00 +``` +**Observation:** System freshly started. Load averages all at 0.00 (1min, 5min, 15min) - no load stress. + +--- + +## Disk & I/O Snapshot + +### Command 9: Disk Space Usage +```bash +df -h +``` +**Output:** +``` +Filesystem Size Used Avail Use% Mounted on +none 9.9G 2.2M 9.9G 1% / +none 315G 0 315G 0% /dev +none 1.0P 0 1.0P 0% /mnt/transcripts +``` +**Observation:** Excellent disk health. Root filesystem only 1% used (2.2MB of 9.9GB). All critical partitions have abundant space. + +--- + +### Command 10: Log Directory Size +```bash +du -sh /var/log +``` +**Output:** +``` +967K /var/log +``` +**Observation:** Log directory consuming less than 1MB. No log accumulation issues. System logs are healthy and not filling disk. + +--- + +## Network Snapshot + +### Command 11: TCP Connections +```bash +cat /proc/net/tcp | head -10 +``` +**Output:** +``` + sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode + 3: 00000000:07E8 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 4 + 87: 08000015:07E8 5E03040A:EB48 01 00000000:00000000 00:00000000 00000000 0 0 88 +``` +**Observation:** Port 0x07E8 (2024 decimal) is listening (state 0A = LISTEN). One established connection detected. No queue backlogs. + +--- + +### Command 12: Service Endpoint Test +```bash +curl -I http://127.0.0.1:2024 --max-time 5 +``` +**Output:** +``` +curl: (56) Recv failure: Connection reset by peer +``` +**Observation:** HTTP connection to process_api port 2024 is being reset. Service may be using WebSocket protocol instead of HTTP, or authentication is required. This is expected behavior based on the --block-local-connections flag. + +--- + +## Logs Reviewed + +### Command 13: System Package Logs +```bash +tail -n 50 /var/log/dpkg.log | head -30 +``` +**Output:** +``` +2025-11-21 01:59:15 status installed glib-networking:amd64 2.80.0-1build1 +2025-11-21 01:59:15 status installed libsoup-3.0-0:amd64 3.4.4-5ubuntu0.5 +[... additional package installation logs ...] +``` +**Observation:** Last package operations were successful installations on 2025-11-21. No package failures or dpkg errors. System packages are stable. + +--- + +### Command 14: Kernel Messages +```bash +dmesg | tail -50 +``` +**Output:** +``` +[ 0.000000] Starting gVisor... +[ 0.570863] Checking naughty and nice process list... +[ 3.070288] Ready! +``` +**Observation:** Clean boot sequence. gVisor container runtime initialized successfully. No kernel panics, OOM kills, or hardware errors. System is "Ready!" + +--- + +## Findings Summary + +### Healthy Indicators: +1. **CPU Usage:** 2.5% - well within normal range +2. **Memory Usage:** 0.1% (13MB/9GB) - excellent +3. **Disk Space:** 1% utilization on root - no space concerns +4. **Load Average:** 0.00 across all intervals - no system stress +5. **Logs:** Clean, no errors +6. **Filesystem:** Read/write operations functioning correctly + +### Health Status: **HEALTHY** + +**Overall Assessment:** The process_api service is running normally with excellent resource utilization. All critical subsystems (CPU, memory, disk, logs) show healthy metrics. The network connection reset is likely intentional based on service configuration. + +--- + +## Escalation Steps + +### Step 1: Increase Monitoring Granularity +**If CPU spikes above 80% or memory above 50%:** +```bash +# Monitor process in real-time +watch -n 1 'ps -o pid,pcpu,pmem,rss,vsz,cmd -p 1' + +# Check for memory leaks +cat /proc/1/status | grep -i mem + +# Monitor file descriptors (potential leak indicator) +ls -la /proc/1/fd | wc -l + +# Check thread count +ps -o nlwp -p 1 +``` + +--- + +### Step 2: Enable Detailed Process Tracing +**If service becomes unresponsive or behavior is abnormal:** +```bash +# Capture system calls (if strace available) +strace -p 1 -f -e trace=network,file -o /tmp/process_api.strace + +# Check for blocked I/O +cat /proc/1/stack + +# Monitor process state changes +while true; do ps -o pid,state,wchan -p 1; sleep 1; done +``` + +--- + +### Step 3: Log Analysis & Service Restart Strategy +**If errors appear in logs or service is degraded:** +```bash +# Create full diagnostic snapshot before restart +mkdir -p /tmp/incident-$(date +%Y%m%d-%H%M%S) +cp /var/log/* /tmp/incident-$(date +%Y%m%d-%H%M%S)/ 2>/dev/null +ps aux > /tmp/incident-$(date +%Y%m%d-%H%M%S)/processes.txt +free -h > /tmp/incident-$(date +%Y%m%d-%H%M%S)/memory.txt +df -h > /tmp/incident-$(date +%Y%m%d-%H%M%S)/disk.txt + +# Capture last-known-good state +cat /proc/1/cmdline > /tmp/incident-$(date +%Y%m%d-%H%M%S)/cmdline.txt +cat /proc/1/environ > /tmp/incident-$(date +%Y%m%d-%H%M%S)/environ.txt + +# If using systemd (not available here but included for reference): +# systemctl restart process-api +# journalctl -u process-api -n 100 --no-pager + +# Alternative: Send SIGTERM for graceful shutdown (if restart needed) +kill -15 1 # Graceful termination +# Note: In this container, PID 1 restart would restart the container +``` + +--- + +### Step 4: Advanced Diagnostics +**If issue persists after restart:** +```bash +# Check for network port conflicts +cat /proc/net/tcp | grep 07E8 # 2024 in hex + +# Verify service can bind to port +# (This would require restarting the service to test) + +# Check resource limits +cat /proc/1/limits + +# Monitor I/O operations +cat /proc/1/io + +# Check memory maps for shared library issues +cat /proc/1/maps | grep -E '(so|lib)' +``` + +--- + +### Step 5: Incident Documentation & Escalation +**If all troubleshooting fails:** + +1. **Document the incident:** + - Time of initial detection + - All commands run and outputs captured + - Snapshot directory location: `/tmp/incident-YYYYMMDD-HHMMSS/` + +2. **Escalate to:** + - Platform team (if infrastructure issue) + - Application team (if service-specific issue) + - Security team (if signs of compromise) + +3. **Provide artifacts:** + - Complete `/tmp/incident-*` directory + - Timeline of events + - Impact assessment (users affected, data loss, etc.) + +--- + +--- + +## Runbook Maintenance + +**Last Updated:** 2026-01-31 +**Next Review:** 2026-02-07 (weekly review recommended) +**Owner:** Rameez Ahmed + +**Change Log:** +- 2026-01-31: Initial runbook creation based on process_api health check drill \ No newline at end of file diff --git a/2026/day-06/file-io-practice.md b/2026/day-06/file-io-practice.md new file mode 100644 index 0000000000..c998f41521 --- /dev/null +++ b/2026/day-06/file-io-practice.md @@ -0,0 +1,347 @@ +# File I/O Practice - Day 06 + +**Date:** 2026-02-01 +**Goal:** Master basic Linux file read/write operations +**Duration:** 30 minutes +**Difficulty:** Beginner + +--- + +## Overview + +This practice session covers fundamental file operations that every DevOps engineer uses daily: +- Creating files +- Writing content (overwrite vs append) +- Reading entire files +- Reading specific parts of files +- Using tee for simultaneous write and display + +--- + +## Commands Practiced + +### 1. Creating an Empty File + +**Command:** +```bash +touch notes.txt +``` + +**What it does:** +- Creates an empty file named `notes.txt` +- If file exists, updates its timestamp +- Commonly used to initialize files or update modification times + +**Output:** +```bash +ls -lh notes.txt +-rw-r--r-- 1 root root 0 Feb 1 14:21 notes.txt +``` + +**Use Case:** Creating placeholder files, updating timestamps for Makefiles + +--- + +### 2. Writing to File (Overwrite) + +**Command:** +```bash +echo "Line 1: This is the first line written with > redirection" > notes.txt +``` + +**What it does:** +- `>` redirects output to a file +- **OVERWRITES** the entire file content +- If file doesn't exist, creates it +- **Warning:** Be careful with `>` as it destroys existing content! + +**Use Case:** Creating new config files, resetting log files + +--- + +### 3. Appending to File + +**Command:** +```bash +echo "Line 2: This line is appended with >> redirection" >> notes.txt +``` + +**What it does:** +- `>>` redirects output and **APPENDS** to file +- Preserves existing content +- Adds new content at the end +- Safe for adding to existing files + +**Use Case:** Adding log entries, appending to configuration files + +--- + +### 4. Using tee to Write and Display + +**Command:** +```bash +echo "Line 3: Added using tee command (writes and displays)" | tee -a notes.txt +``` + +**What it does:** +- `tee` reads from stdin and writes to both stdout AND file(s) +- `-a` flag appends to file (without `-a`, it overwrites) +- Shows you what was written (good for confirmation) +- Can write to multiple files: `tee file1.txt file2.txt` + +**Output:** +``` +Line 3: Added using tee command (writes and displays) +``` + +**Use Case:** +- Logging command output while viewing it +- Writing to multiple log files simultaneously +- Pipelines where you need both file output and terminal display + +--- + +### 5. Adding More Lines + +**Commands:** +```bash +echo "Line 4: DevOps engineers work with config files daily" >> notes.txt +echo "Line 5: Logs are critical for troubleshooting issues" >> notes.txt +echo "Line 6: Scripts automate repetitive tasks" >> notes.txt +echo "Line 7: Understanding file I/O is fundamental" >> notes.txt +echo "Line 8: Master these basics before advanced topics" >> notes.txt +``` + +**What it does:** +- Builds up our practice file with meaningful content +- Each line appended without destroying previous content + +--- + +### 6. Reading the Entire File + +**Command:** +```bash +cat notes.txt +``` + +**What it does:** +- `cat` (concatenate) displays entire file content +- Outputs all lines to stdout +- Can concatenate multiple files: `cat file1.txt file2.txt` + +**Output:** +``` +Line 1: This is the first line written with > redirection +Line 2: This line is appended with >> redirection +Line 3: Added using tee command (writes and displays) +Line 4: DevOps engineers work with config files daily +Line 5: Logs are critical for troubleshooting issues +Line 6: Scripts automate repetitive tasks +Line 7: Understanding file I/O is fundamental +Line 8: Master these basics before advanced topics +``` + +**Use Case:** Viewing small config files, quick content checks, file concatenation + +--- + +### 7. Reading First N Lines (head) + +**Command:** +```bash +head -n 3 notes.txt +``` + +**What it does:** +- Shows first 3 lines of file +- Default is 10 lines if `-n` not specified +- Useful for seeing file headers or recent entries (if file is time-ordered) + +**Output:** +``` +Line 1: This is the first line written with > redirection +Line 2: This line is appended with >> redirection +Line 3: Added using tee command (writes and displays) +``` + +**Alternative:** +```bash +head -n 2 notes.txt +Line 1: This is the first line written with > redirection +Line 2: This line is appended with >> redirection +``` + +**Use Case:** +- Checking CSV headers +- Viewing file format before processing +- Quick preview of large files + +--- + +### 8. Reading Last N Lines (tail) + +**Command:** +```bash +tail -n 3 notes.txt +``` + +**What it does:** +- Shows last 3 lines of file +- Default is 10 lines if `-n` not specified +- **Critical for log files** (most recent entries at the end) + +**Output:** +``` +Line 6: Scripts automate repetitive tasks +Line 7: Understanding file I/O is fundamental +Line 8: Master these basics before advanced topics +``` + +**Alternative:** +```bash +tail -n 2 notes.txt +Line 7: Understanding file I/O is fundamental +Line 8: Master these basics before advanced topics +``` + +**Use Case:** +- Checking most recent log entries +- Monitoring error logs +- Following file updates with `tail -f` (live monitoring) + +--- + +### 9. Advanced: Following Files in Real-Time + +**Command:** +```bash +tail -f /var/log/syslog +``` + +**What it does:** +- `-f` (follow) keeps tail running +- Displays new lines as they're added +- **Essential for live log monitoring** +- Exit with `Ctrl+C` + +**Use Case:** Monitoring application logs during deployments, debugging in real-time + +--- + +### 10. Bonus: Counting Lines + +**Command:** +```bash +wc -l notes.txt +``` + +**What it does:** +- `wc` (word count) with `-l` counts lines +- Useful for validation and file size checks + +**Output:** +``` +8 notes.txt +``` + +**Use Case:** Validating data imports, checking log file size, script verification + +--- + +## Complete Practice Session + +### Final File Content (notes.txt) +``` +Line 1: This is the first line written with > redirection +Line 2: This line is appended with >> redirection +Line 3: Added using tee command (writes and displays) +Line 4: DevOps engineers work with config files daily +Line 5: Logs are critical for troubleshooting issues +Line 6: Scripts automate repetitive tasks +Line 7: Understanding file I/O is fundamental +Line 8: Master these basics before advanced topics +``` + +**Total Lines:** 8 +**File Size:** ~420 bytes + +--- + +## Why This Matters for DevOps + +### Real-World Scenarios: + +1. **Config Management:** + ```bash + echo "ServerName example.com" >> /etc/apache2/apache2.conf + ``` + +2. **Log Monitoring:** + ```bash + tail -f /var/log/nginx/error.log + ``` + +3. **Script Output Logging:** + ```bash + ./deploy.sh | tee -a deployment.log + ``` + +4. **Checking Log Errors:** + ```bash + tail -n 100 /var/log/app.log | grep ERROR + ``` + +5. **Creating Documentation:** + ```bash + echo "# Server Inventory" > inventory.md + echo "- server1.example.com" >> inventory.md + ``` + +--- + +## Next Steps + +### Practice These Variations: + +1. **Create a script log:** + ```bash + date > script.log + echo "Starting process..." >> script.log + echo "Process complete" >> script.log + cat script.log + ``` + +2. **Monitor a growing file:** + ```bash + # Terminal 1: + tail -f growing.txt + + # Terminal 2: + echo "New entry" >> growing.txt + ``` + +3. **Save command output:** + ```bash + ps aux > processes.txt + df -h >> processes.txt + cat processes.txt + ``` + +4. **Use tee in a pipeline:** + ```bash + ls -la | tee directory_listing.txt | grep "\.txt$" + ``` + +--- + +## Additional Resources + +- `man cat` - Manual for cat command +- `man head` - Manual for head command +- `man tail` - Manual for tail command +- `man tee` - Manual for tee command +- `man touch` - Manual for touch command + +--- + diff --git a/2026/day-06/notes.txt b/2026/day-06/notes.txt new file mode 100644 index 0000000000..d770ab4b73 --- /dev/null +++ b/2026/day-06/notes.txt @@ -0,0 +1,8 @@ +Line 1: This is the first line written with > redirection +Line 2: This line is appended with >> redirection +Line 3: Added using tee command (writes and displays) +Line 4: DevOps engineers work with config files daily +Line 5: Logs are critical for troubleshooting issues +Line 6: Scripts automate repetitive tasks +Line 7: Understanding file I/O is fundamental +Line 8: Master these basics before advanced topics diff --git a/2026/day-06/notes_demo.txt b/2026/day-06/notes_demo.txt new file mode 100644 index 0000000000..4e2b2a561d --- /dev/null +++ b/2026/day-06/notes_demo.txt @@ -0,0 +1,2 @@ +Line 9: This demonstrates tee without append (overwrites) +Line 10: Tee can also append with -a flag diff --git a/2026/day-07/backup.sh b/2026/day-07/backup.sh new file mode 100644 index 0000000000..b616c37832 --- /dev/null +++ b/2026/day-07/backup.sh @@ -0,0 +1,3 @@ +#!/bin/bash +echo "Running backup script..." +echo "Backup complete!" diff --git a/2026/day-07/day-07-linux-fs-and-scenarios.md b/2026/day-07/day-07-linux-fs-and-scenarios.md new file mode 100644 index 0000000000..d6118bbbf9 --- /dev/null +++ b/2026/day-07/day-07-linux-fs-and-scenarios.md @@ -0,0 +1,602 @@ +# Day 07 – Linux File System Hierarchy & Scenario-Based Practice + +**Date:** 2026-02-01 +**Goal:** Understand Linux filesystem structure and practice real-world troubleshooting + +--- + +## Part 1: Linux File System Hierarchy (30 minutes) + +### Overview +The Linux filesystem follows the **Filesystem Hierarchy Standard (FHS)**. Everything starts from the root (`/`) directory and branches out in a tree structure. + +--- + +### Core Directories (Must Know) + +#### 1. `/` - Root Directory +**What it contains:** The starting point of the entire filesystem hierarchy +**Files/Folders observed:** +```bash +ls -l / +drwxr-xr-x 1 root root 4096 Nov 21 02:02 etc +drwxr-xr-x 1 root root 4096 Nov 21 01:53 home +drwxr-xr-x 1 root root 4096 Jan 31 19:00 mnt +drwx--S--- 1 root root 4096 Jan 31 19:41 root +drwxrwxrwt 1 root root 4096 Jan 31 19:41 tmp +``` + +**I would use this when:** +- Understanding the overall system structure +- Navigating to any location using absolute paths +- Performing system-wide operations + +--- + +#### 2. `/home` - User Home Directories +**What it contains:** Personal directories for regular (non-root) users +**Files/Folders observed:** +```bash +ls -la /home +drwxr-xr-x 1 root root 4096 Nov 21 01:53 rameez +drwxr-x--- 2 ubuntu ubuntu 100 Oct 13 14:09 ahmed +``` + +**I would use this when:** +- Accessing user-specific files, configurations, and data +- Managing user documents, scripts, or personal configs +- Setting up development environments for specific users + +**Real-world example:** `/home/rameez/.bashrc` for user-specific shell configuration + +--- + +#### 3. `/root` - Root User's Home Directory +**What it contains:** Home directory for the root (administrator) user +**Files/Folders observed:** +```bash +ls -la /root +drwx--S--- 1 root root 4096 Jan 31 19:41 . +-rw-r--r-- 1 root root 3106 Apr 22 2024 .bashrc +-rw-r--r-- 1 root root 161 Apr 22 2024 .profile +drwx--S--- 2 root root 40 Nov 21 01:55 .ssh +``` + +**I would use this when:** +- Running administrative tasks as root +- Storing root user's SSH keys, configs, and scripts +- Accessing root's command history and preferences + +**Security note:** This directory has restricted permissions (700) for security + +--- + +#### 4. `/etc` - Configuration Files +**What it contains:** System-wide configuration files for applications and services +**Files/Folders observed:** +```bash +ls -la /etc | head -15 +-rw-r--r-- 1 root root 3444 Jul 5 2023 adduser.conf +drwxr-xr-x 2 root root 1380 Nov 21 01:59 alternatives +drwxr-xr-x 8 root root 180 Oct 13 14:03 apt +-rw-r--r-- 1 root root 2319 Mar 31 2024 bash.bashrc +-rw-r--r-- 1 root root 0 Oct 13 14:03 hosts +-rw-r--r-- 1 root root 0 Oct 13 14:03 hostname +``` + +**Example config file:** +```bash +cat /etc/hosts +# BEGIN CONTAINER MANAGED HOSTS +127.0.0.1 localhost +127.0.0.1 runsc +# END CONTAINER MANAGED HOSTS +``` + +**I would use this when:** +- Configuring network settings (`/etc/hosts`, `/etc/resolv.conf`) +- Modifying service configurations (`/etc/nginx/nginx.conf`) +- Managing user/group information (`/etc/passwd`, `/etc/group`) +- Setting up system-wide environment variables + +**DevOps critical:** This is where 90% of configuration changes happen! + +--- + +#### 5. `/var/log` - Log Files +**What it contains:** System and application log files +**Files/Folders observed:** +```bash +ls -lh /var/log +-rw-r--r-- 1 root root 20K Nov 21 01:59 alternatives.log +drwxr-xr-x 2 root root 100 Nov 21 01:59 apt +-rw-r--r-- 1 root root 60K Oct 13 14:03 bootstrap.log +-rw-r--r-- 1 root root 575K Nov 21 02:00 dpkg.log +drwxr-sr-x 2 root systemd-journal 40 Nov 21 01:55 journal +``` + +**Largest log files found:** +```bash +du -sh /var/log/* 2>/dev/null | sort -h | tail -5 +5.5K /var/log/fontconfig.log +20K /var/log/alternatives.log +60K /var/log/bootstrap.log +305K /var/log/apt +575K /var/log/dpkg.log +``` + +**I would use this when:** +- Troubleshooting application failures +- Investigating security incidents +- Monitoring system events +- Debugging deployment issues +- Checking authentication attempts (`/var/log/auth.log`) + +--- + +#### 6. `/tmp` - Temporary Files +**What it contains:** Temporary files that may be deleted on reboot +**Files/Folders observed:** +```bash +ls -la /tmp | head -10 +drwxrwxrwt 1 root root 4096 Jan 31 19:41 . +drwxr-xr-x 2 root root 40 Nov 21 02:00 hsperfdata_root +drwxr-xr-x 3 root root 60 Nov 21 01:57 node-compile-cache +drwxr-xr-x 2 root root 4096 Jan 31 19:41 runbook-demo +``` + +**I would use this when:** +- Storing temporary files during script execution +- Testing file operations without affecting production data +- Creating temporary working directories for deployments +- Storing session data that doesn't need to persist + +**Note:** Files here may be deleted on system reboot. Don't store important data! + +--- + +### Additional Directories (Good to Know) + +#### 7. `/bin` - Essential Command Binaries +**What it contains:** Essential command binaries needed for system boot and single-user mode +**Files/Folders observed:** +```bash +ls -l /bin +lrwxrwxrwx 1 root root 7 Apr 22 2024 /bin -> usr/bin +``` + +**I would use this when:** +- Locating basic commands like `ls`, `cat`, `cp`, `grep` +- Understanding where shell commands are stored +- Troubleshooting PATH issues + +**Common binaries:** bash, cat, chmod, cp, date, echo, grep, ls, mkdir, rm, sh + +--- + +#### 8. `/usr/bin` - User Command Binaries +**What it contains:** User command binaries and applications +**Files/Folders observed:** +```bash +ls -l /usr/bin | head -10 +-rwxr-xr-x 1 root root 2064864 Oct 23 17:29 Xvfb +-rwxr-xr-x 1 root root 55744 Jun 22 2025 [ +-rwxr-xr-x 1 root root 14488 May 11 2024 acyclic +-rwxr-xr-x 1 root root 16422 Jul 2 2025 add-apt-repository +``` + +**I would use this when:** +- Running non-essential user programs +- Finding installed applications +- Checking which version of a tool is installed (`which python3`) + +**Contains:** python, git, curl, wget, vim, nano, gcc, and most user applications + +--- + +#### 9. `/opt` - Optional/Third-Party Applications +**What it contains:** Add-on application software packages +**Files/Folders observed:** +```bash +ls -la /opt +drwxr-xr-x 3 root root 60 Nov 21 01:57 google +drwxr-xr-x 6 root root 120 Nov 21 01:59 pw-browsers +``` + +**I would use this when:** +- Installing third-party software that doesn't follow standard paths +- Deploying custom applications (like `/opt/myapp`) +- Managing commercial software installations +- Keeping manually installed software separate from package-managed software + +**Examples:** `/opt/google/chrome`, `/opt/teamviewer`, custom enterprise apps + +--- + +### Additional Important Directories (Quick Reference) + +| Directory | Purpose | Use Case | +| --------- | ---------------------- | ------------------------------------------ | +| `/var` | Variable data files | Logs, databases, email, print queues | +| `/usr` | User programs and data | Installed applications, libraries | +| `/dev` | Device files | Hardware devices (disks, terminals) | +| `/proc` | Process information | Virtual filesystem for kernel/process info | +| `/sys` | System information | Kernel and hardware configuration | +| `/boot` | Boot loader files | Kernel, initrd, bootloader config | +| `/lib` | Essential libraries | Shared libraries for /bin and /sbin | +| `/mnt` | Mount points | Temporary mount points for filesystems | +| `/media` | Removable media | USB drives, CD-ROMs auto-mounted here | +| `/srv` | Service data | Data for services (web, FTP) | + +--- + +## Part 2: Scenario-Based Practice (40 minutes) + +### Understanding the Approach + +**Key principle:** Follow a systematic troubleshooting flow: +1. **Observe** - What's the current state? +2. **Investigate** - Gather data and logs +3. **Diagnose** - Identify the root cause +4. **Fix** - Apply the solution +5. **Verify** - Confirm it's working + +--- + +### SOLVED EXAMPLE: Check if a Service is Running + +**Scenario:** How do you check if the 'nginx' service is running? + +#### My Solution (Step by step): + +**Step 1: Check service status** +```bash +systemctl status nginx +``` +**Why this command?** Shows if service is active, failed, or stopped with recent log entries + +**Expected output patterns:** +- `Active: active (running)` → Service is working +- `Active: failed` → Service crashed, check logs +- `Active: inactive (dead)` → Service is stopped + +--- + +**Step 2: If service is not found, list all services** +```bash +systemctl list-units --type=service | grep nginx +``` +**Why this command?** Confirms if service exists on the system + +--- + +**Step 3: Check if service is enabled on boot** +```bash +systemctl is-enabled nginx +``` +**Why this command?** Determines if it will start automatically after reboot + +**Output meanings:** +- `enabled` → Starts on boot +- `disabled` → Won't start on boot +- `static` → Controlled by another service + +--- + +**What I learned:** +Always check status first, then investigate based on what you see. Status command gives 80% of the info you need. + +--- + +### Scenario 1: Service Not Starting + +**Problem:** A web application service called 'myapp' failed to start after a server reboot. What commands would you run to diagnose the issue? + +#### Solution: + +**Step 1: Check if service is running or failed** +```bash +systemctl status ssh +``` +**Why:** Shows current state (active/failed/inactive) and recent log entries +**Look for:** Exit codes, error messages in the status output + +--- + +**Step 2: Check if service is enabled on boot** +```bash +systemctl is-enabled ssh +``` +**Why:** Determines if service will auto-start after reboot +**If disabled:** Service won't start automatically - this might be the issue! + +--- + +**Step 3: View recent logs for error messages** +```bash +journalctl -u ssh -n 50 +``` +**Why:** Shows last 50 log entries to identify error messages +**Look for:** Stack traces, "failed to start", permission errors, port conflicts + +--- + +**Step 4: View logs with explanatory text** +```bash +journalctl -u ssh -xe +``` +**Why:** `-x` adds explanatory help text, `-e` jumps to end of logs +**Helps:** Understand systemd-specific errors + +--- + +**Step 5: If service is disabled, enable and start it** +```bash +systemctl enable --now ssh +``` +**Why:** Enables service for boot AND starts it immediately +**Verify with:** `systemctl status myapp` + +--- + +**Additional troubleshooting commands:** +```bash +# Check service file for misconfiguration +systemctl cat ssh + +# Reload systemd if you edited the service file +systemctl daemon-reload + +# View all failed services +systemctl --failed +``` + +--- + +### Scenario 2: High CPU Usage + +**Problem:** Your manager reports that the application server is slow. You SSH into the server. What commands would you run to identify which process is using high CPU? + +#### Solution: + +**Step 1: Check real-time CPU usage with top** +```bash +top +``` +**Why:** Shows live CPU usage sorted by CPU% (press 'q' to quit) +**Look for:** Process with highest %CPU in the top rows +**Key columns:** +- `PID` - Process ID +- `%CPU` - CPU usage percentage +- `%MEM` - Memory usage percentage +- `COMMAND` - Process name + +**Pro tip:** Press `1` to see individual CPU cores, `P` to sort by CPU + +--- + +**Step 2: Get sorted list of top CPU consumers** +```bash +ps aux --sort=-%cpu | head -10 +``` +**Why:** Shows top 10 processes by CPU usage with full details +**Advantage:** Non-interactive, can be saved to a file or sent in reports + +--- + +**Step 3: Note the PID and get more details** +```bash +ps -o pid,pcpu,pmem,cmd -p +``` +**Why:** Shows detailed info about specific process including full command +**Example:** `ps -o pid,pcpu,pmem,cmd -p 1234` + +--- + +**Step 4: Check if process is stuck or working normally** +```bash +top -p +``` +**Why:** Monitor specific process CPU usage over time +**Helps:** Determine if CPU spike is temporary or constant + +--- + +**Step 5: Investigate what the process is doing** +```bash +lsof -p +``` +**Why:** Lists open files and connections to understand process activity +**Shows:** Files being read/written, network connections, shared libraries + +--- + +**Additional investigation commands:** +```bash +# Check process threads +ps -eLf | grep + +# See system load average +uptime + +# Check CPU-intensive processes over time +sar -u 1 10 # if sysstat is installed +``` + +--- + +### Scenario 3: Finding Service Logs + +**Problem:** A developer asks: "Where are the logs for the 'docker' service?" The service is managed by systemd. What commands would you use? + +#### Solution: + +**Step 1: First check if service exists and its status** +```bash +systemctl status docker +``` +**Why:** Confirms service exists and shows brief status with recent log entries +**Output includes:** Last few log lines directly in the status + +--- + +**Step 2: View last 50 lines of service logs** +```bash +journalctl -u docker -n 50 +``` +**Why:** Shows recent log entries for the docker service +**Alternative:** `-n 100` for more lines, `-n 20` for fewer + +--- + +**Step 3: Follow logs in real-time (like tail -f)** +```bash +journalctl -u docker -f +``` +**Why:** Continuously displays new log entries as they occur +**Use case:** Monitoring during deployment or troubleshooting +**Exit:** Press `Ctrl+C` + +--- + +**Step 4: View logs with timestamp filter** +```bash +journalctl -u docker --since '1 hour ago' +``` +**Why:** Shows only logs from the last hour (useful for recent issues) +**Other examples:** +- `--since '2026-02-01 10:00:00'` +- `--since today` +- `--since yesterday` + +--- + +**Step 5: Search logs for specific errors** +```bash +journalctl -u docker | grep -i error +``` +**Why:** Filters logs to show only error messages +**Case-insensitive:** `-i` flag catches ERROR, error, Error + +--- + +**Advanced log commands:** +```bash +# Show logs between specific times +journalctl -u docker --since "2026-02-01 09:00" --until "2026-02-01 10:00" + +# Show logs with priority level +journalctl -u docker -p err # Only errors and above + +# Export logs to file +journalctl -u docker > docker-logs.txt + +# Show kernel logs +journalctl -k +``` + +--- + +### Scenario 4: File Permissions Issue + +**Problem:** A script at `/home/ahmed/backup.sh` is not executing. When you run `./backup.sh`, you get "Permission denied". What commands would you use to fix this? + +#### Solution (PRACTICAL DEMONSTRATION): + +**Step 1: Check current permissions** +```bash +ls -l /home/ahmed/backup.sh +``` +**Output:** +``` +-rw-r--r-- 1 root root 68 Feb 1 14:48 backup.sh +``` +**Observation:** Notice `-rw-r--r--` (no 'x' = not executable) + +**Understanding the permissions:** +- `-` = regular file +- `rw-` = owner can read and write +- `r--` = group can only read +- `r--` = others can only read +- **Missing:** Execute permission (`x`) + +--- + +**Step 2: Try to execute the script (will fail)** +```bash +./backup.sh +``` +**Output:** +``` +/bin/sh: 16: ./backup.sh: Permission denied +``` +**Why it fails:** File doesn't have execute (`x`) permission + +--- + +**Step 3: Add execute permission** +```bash +chmod +x backup.sh +``` +**Why:** Adds execute permission for owner, group, and others +**Alternative options:** +- `chmod u+x backup.sh` - Execute for owner only +- `chmod 755 backup.sh` - rwxr-xr-x (numeric method) +- `chmod +x *.sh` - Add execute to all shell scripts + +--- + +**Step 4: Verify permissions changed** +```bash +ls -l backup.sh +``` +**Output:** +``` +-rwxr-xr-x 1 root root 68 Feb 1 14:48 backup.sh +``` +**Observation:** Now shows `-rwxr-xr-x` (has 'x' = executable) + +--- + +**Step 5: Try running it again (should work now)** +```bash +./backup.sh +``` +**Output:** +``` +Running backup script... +Backup complete! +``` +**Success!** Script now executes properly + +--- + +**Permission troubleshooting tips:** +```bash +# Check who owns the file +ls -l backup.sh + +# Change ownership if needed +chown user:group backup.sh + +# Give read, write, execute to owner only +chmod 700 backup.sh + +# Common script permissions +chmod 755 backup.sh # Owner: rwx, Others: r-x +chmod 644 config.txt # Owner: rw-, Others: r-- +``` + +--- + +### Common Command Patterns: + +| Task | Primary Command | Alternative | +|------|----------------|-------------| +| Service status | `systemctl status ` | `service status` | +| View logs | `journalctl -u ` | `tail -f /var/log/.log` | +| CPU usage | `top` | `htop`, `ps aux --sort=-%cpu` | +| Permissions | `ls -l ` | `stat ` | +| Fix permissions | `chmod +x ` | `chmod 755 ` | + +--- diff --git a/2026/day-08/day_08_nginx_deployment_on_server.md b/2026/day-08/day_08_nginx_deployment_on_server.md new file mode 100644 index 0000000000..950c818bd9 --- /dev/null +++ b/2026/day-08/day_08_nginx_deployment_on_server.md @@ -0,0 +1,276 @@ +# Day 08 – Deploying Nginx Server on AWS EC2 Instance + +> **Date:** 18 February 2026 +> **Author:** Rameez Ahmed +> **Region:** Europe (Ireland) – `eu-west-1` + +--- + +## 📋 Task Overview + +The goal of this task is to: + +1. Launch an **AWS EC2 instance** running Ubuntu. +2. **SSH** into the instance. +3. **Install Nginx** web server on the instance. +4. **Configure the Security Group** inbound rules to allow HTTP (port 80) traffic from the internet. +5. **Verify** that Nginx is accessible from a browser over the public IP. +6. **Inspect** Nginx access logs to confirm incoming traffic. + +--- + +## 🔧 Step 1 – Navigate to the EC2 Dashboard + +Open the **AWS Management Console** and navigate to the **EC2** service. From the EC2 dashboard, click the **Launch Instance** button to begin creating a new virtual server. + +![EC2 Dashboard – Launch Instance](src/1.png) + +--- + +## 🖥️ Step 2 – Configure the Instance + +### Name and Tags + +- **Name:** `nginx-server` + +### Application and OS Image (AMI) + +- **OS:** Ubuntu (Quick Start) +- **AMI:** Ubuntu Server 24.04 LTS (HVM), SSD Volume Type +- **AMI ID:** `ami-03446a3af42c5e74e` +- **Architecture:** 64-bit (x86) +- **Free Tier Eligible:** ✅ Yes + +![Instance Name & AMI Selection](src/2.png) + +--- + +## ⚙️ Step 3 – Choose Instance Type & Key Pair + +### Instance Type + +- **Type:** `t3.micro` (2 vCPU, 1 GiB Memory) – Free Tier Eligible + +### Key Pair (Login) + +- **Key Pair Name:** `nginx-server-key` +- Used to securely connect to the instance via SSH. + +### Network Settings + +- **VPC:** `vpc-08ee45a7e2b18123e` +- **Auto-assign Public IP:** Enabled +- **Firewall (Security Groups):** Create a new security group (`launch-wizard-1`) + - ✅ Allow SSH traffic from **Anywhere** (`0.0.0.0/0`) + +![Instance Type, Key Pair & Network Settings](src/3.png) + +--- + +## 💾 Step 4 – Configure Storage & Launch + +### Storage + +- **1x 8 GiB** – `gp3` Root volume, 3000 IOPS, Not encrypted + +After reviewing all settings, click the **Launch Instance** button. + +![Storage Configuration & Launch](src/4.png) + +--- + +## 🔗 Step 5 – Connect to the Instance via SSH + +Once the instance is in the **Running** state, go to **Instances → Connect** and select the **SSH client** tab. + +### Connection Details + +| Field | Value | +| ------------------ | -------------------------------------------------------------- | +| **Instance ID** | `i-056c40c620930e8c2` (nginx-server) | +| **VPC ID** | `vpc-08ee45a7e2b18123e` | +| **Security Group** | `sg-0faf8cc6d417b51b4` (launch-wizard-1) | +| **Public DNS** | `ec2-3-255-205-188.eu-west-1.compute.amazonaws.com` | +| **Key File** | `nginx-server-key.pem` | + +### SSH Commands + +```bash +# Set correct permissions on the key file +chmod 400 "nginx-server-key.pem" + +# Connect to the instance +ssh -i "nginx-server-key.pem" ubuntu@ec2-3-255-205-188.eu-west-1.compute.amazonaws.com +``` + +![SSH Connection Instructions](src/5.png) + +--- + +## 🔐 Step 6 – SSH Into the Server + +From the local terminal, run the SSH command. On the first connection, accept the host fingerprint by typing `yes`. + +``` +$ ssh -i "nginx-server-key.pem" ubuntu@ec2-3-255-205-188.eu-west-1.compute.amazonaws.com + +The authenticity of host 'ec2-3-255-205-188.eu-west-1.compute.amazonaws.com (3.255.205.188)' can't be established. +ED25519 key fingerprint is SHA256:zsUgfvXMa+QmHm39g+ghicecu0z04mpIVDANp1RsWXY. +This key is not known by any other names. +Are you sure you want to continue connecting (yes/no/[fingerprint])? yes +``` + +![SSH Connection – Terminal](src/6.png) + +--- + +## 📦 Step 7 – Update System Packages + +Once connected, update the package lists to ensure you install the latest available versions. + +```bash +sudo apt update +``` + +![apt update output](src/7.png) + +--- + +## 🌐 Step 8 – Install Nginx + +Install the Nginx web server using the `apt` package manager. + +```bash +sudo apt install nginx +``` + +- **Nginx version installed:** `1.24.0-2ubuntu7.6` +- Packages installed: `nginx`, `nginx-common` + +![Nginx Installation](src/8.png) + +--- + +## ✅ Step 9 – Verify Nginx Status + +Check that Nginx is running and enabled using `systemctl`. + +```bash +systemctl status nginx +``` + +**Output confirms:** +- **Active:** `active (running)` ✅ +- **Enabled:** `enabled` (starts on boot) +- **Main PID:** `1763` +- **Workers:** 2 worker processes + +![Nginx Status – Active and Running](src/9.png) + +--- + +## 🛡️ Step 10 – View Instance Security Details + +Navigate back to the **EC2 Instances** page in the AWS Console. Select the `nginx-server` instance and go to the **Security** tab. + +### Security Details + +| Field | Value | +| ------------------ | --------------------------------------------------- | +| **Security Group** | `sg-0faf8cc6d417b51b4` (launch-wizard-1) | +| **Owner ID** | `343644158276` | +| **Launch Time** | Wed Feb 18 2026 22:34:44 GMT+0500 (Pakistan Standard Time) | + +![Instance Security Tab](src/10.png) + +--- + +## 🔓 Step 11 – Add HTTP Inbound Rule to Security Group + +To make Nginx accessible from the internet, an HTTP inbound rule must be added to the security group. + +Navigate to **EC2 → Security Groups → sg-0faf8cc6d417b51b4 (launch-wizard-1) → Edit inbound rules**. + +### Inbound Rules Configuration + +| Security Group Rule ID | Type | Protocol | Port Range | Source | +| ----------------------- | ---- | -------- | ---------- | --------------- | +| `sgr-0750c27ffe5256d59` | SSH | TCP | 22 | `0.0.0.0/0` | +| *(new rule)* | HTTP | TCP | 80 | Anywhere (`0.0.0.0/0`) | + +Click **Save rules** to apply the changes. + +![Edit Inbound Rules – Adding HTTP Port 80](src/11.png) + +--- + +## 🎉 Step 12 – Access Nginx from the Browser + +Open a web browser and navigate to the instance's **Public IP address**: + +``` +http://3.255.205.188 +``` + +The default **"Welcome to nginx!"** page is displayed, confirming that Nginx is successfully installed, running, and accessible from the internet. 🎉 + +![Nginx Welcome Page in Browser](src/12.png) + +--- + +## 📊 Step 13 – Check Nginx Access Logs + +To confirm that the incoming HTTP requests are being logged, inspect the Nginx access log: + +```bash +cat /var/log/nginx/access.log +tail -f /var/log/nginx/access.log +``` + +The logs show successful `GET /` requests (HTTP 200) from the browser, confirming real traffic is hitting the server. + +![Nginx Access Logs – cat & tail](src/13.png) + +--- + +## 📝 Step 14 – Extract Nginx Logs + +Optionally, copy the access logs to a file in the home directory for archival or further analysis: + +```bash +cat /var/log/nginx/access.log >> /home/ubuntu/nginx_extracted.log +cat /home/ubuntu/nginx_extracted.log +``` + +![Extracted Nginx Logs](src/14.png) + +--- + +## 📌 Summary + +| Step | Action | Status | +| ---- | ------------------------------------------- | ------ | +| 1 | Navigate to EC2 Dashboard | ✅ | +| 2 | Configure instance name & AMI (Ubuntu 24.04)| ✅ | +| 3 | Select instance type (`t3.micro`) & key pair| ✅ | +| 4 | Configure storage (8 GiB gp3) & launch | ✅ | +| 5 | Get SSH connection details | ✅ | +| 6 | SSH into the instance | ✅ | +| 7 | Update system packages (`apt update`) | ✅ | +| 8 | Install Nginx (`apt install nginx`) | ✅ | +| 9 | Verify Nginx is active & running | ✅ | +| 10 | Review Security Group details | ✅ | +| 11 | Add HTTP (port 80) inbound rule | ✅ | +| 12 | Access Nginx welcome page via browser | ✅ | +| 13 | Inspect Nginx access logs | ✅ | +| 14 | Extract and archive Nginx logs | ✅ | + +--- + +## 🔑 Key Takeaways + +- **EC2** provides on-demand virtual servers in the cloud with flexible instance types. +- **Security Groups** act as virtual firewalls — by default, only SSH (port 22) is allowed; HTTP (port 80) must be explicitly added to serve web traffic. +- **Nginx** is a lightweight, high-performance web server that can be installed with a single `apt install` command on Ubuntu. +- Always verify services with `systemctl status` and confirm network accessibility through the browser. +- **Access logs** (`/var/log/nginx/access.log`) are invaluable for monitoring and debugging incoming traffic. diff --git a/2026/day-08/src/1.png b/2026/day-08/src/1.png new file mode 100644 index 0000000000..36698a8454 Binary files /dev/null and b/2026/day-08/src/1.png differ diff --git a/2026/day-08/src/10.png b/2026/day-08/src/10.png new file mode 100644 index 0000000000..2c1f4729b0 Binary files /dev/null and b/2026/day-08/src/10.png differ diff --git a/2026/day-08/src/11.png b/2026/day-08/src/11.png new file mode 100644 index 0000000000..21367ebded Binary files /dev/null and b/2026/day-08/src/11.png differ diff --git a/2026/day-08/src/12.png b/2026/day-08/src/12.png new file mode 100644 index 0000000000..2ee8ab5e73 Binary files /dev/null and b/2026/day-08/src/12.png differ diff --git a/2026/day-08/src/13.png b/2026/day-08/src/13.png new file mode 100644 index 0000000000..5d9a844895 Binary files /dev/null and b/2026/day-08/src/13.png differ diff --git a/2026/day-08/src/14.png b/2026/day-08/src/14.png new file mode 100644 index 0000000000..196a3e7a4d Binary files /dev/null and b/2026/day-08/src/14.png differ diff --git a/2026/day-08/src/2.png b/2026/day-08/src/2.png new file mode 100644 index 0000000000..203eb603e6 Binary files /dev/null and b/2026/day-08/src/2.png differ diff --git a/2026/day-08/src/3.png b/2026/day-08/src/3.png new file mode 100644 index 0000000000..ac3dbf1963 Binary files /dev/null and b/2026/day-08/src/3.png differ diff --git a/2026/day-08/src/4.png b/2026/day-08/src/4.png new file mode 100644 index 0000000000..21626b6294 Binary files /dev/null and b/2026/day-08/src/4.png differ diff --git a/2026/day-08/src/5.png b/2026/day-08/src/5.png new file mode 100644 index 0000000000..2b733207c8 Binary files /dev/null and b/2026/day-08/src/5.png differ diff --git a/2026/day-08/src/6.png b/2026/day-08/src/6.png new file mode 100644 index 0000000000..e5ab02ce17 Binary files /dev/null and b/2026/day-08/src/6.png differ diff --git a/2026/day-08/src/7.png b/2026/day-08/src/7.png new file mode 100644 index 0000000000..507b21cadf Binary files /dev/null and b/2026/day-08/src/7.png differ diff --git a/2026/day-08/src/8.png b/2026/day-08/src/8.png new file mode 100644 index 0000000000..d611dc1892 Binary files /dev/null and b/2026/day-08/src/8.png differ diff --git a/2026/day-08/src/9.png b/2026/day-08/src/9.png new file mode 100644 index 0000000000..99c9473a22 Binary files /dev/null and b/2026/day-08/src/9.png differ diff --git a/2026/day-09/day-09-user-management.md b/2026/day-09/day-09-user-management.md new file mode 100644 index 0000000000..c76f3d66c0 --- /dev/null +++ b/2026/day-09/day-09-user-management.md @@ -0,0 +1,341 @@ +# Day 09 – Linux User & Group Management Challenge + +**Date:** 2026-02-18 +**Author:** Rameez Ahmed +**Challenge:** Practice user and group management on a Linux system + +--- + +## 📋 Overview + +Today's challenge focused on **Linux user and group management**, a fundamental skill for any DevOps engineer. Managing users, groups, and permissions is critical for maintaining secure systems, controlling access to resources, and following the **principle of least privilege** in production environments. + +--- + +## 👥 Users & Groups Created + +### Users +| Username | Home Directory | Purpose | +|-------------|----------------------|------------------------------| +| `tokyo` | `/home/tokyo` | Developer user | +| `berlin` | `/home/berlin` | Developer + Admin user | +| `professor` | `/home/professor` | Admin user | +| `nairobi` | `/home/nairobi` | Project team member | + +### Groups +| Group Name | Purpose | +|----------------|-------------------------------| +| `developers` | Development team group | +| `admins` | Administrative team group | +| `project-team` | Cross-functional project team | + +--- + +## 🔧 Group Assignments + +| User | Primary Group | Additional Groups | +|-------------|---------------|----------------------------| +| `tokyo` | `tokyo` | `developers`, `project-team` | +| `berlin` | `berlin` | `developers`, `admins` | +| `professor` | `professor` | `admins` | +| `nairobi` | `nairobi` | `project-team` | + +--- + +## 📁 Directories Created + +| Directory | Group Owner | Permissions | Description | +|------------------------|----------------|-------------|-------------------------| +| `/opt/dev-project` | `developers` | `775` (rwxrwxr-x) | Shared dev workspace | +| `/opt/team-workspace` | `project-team` | `775` (rwxrwxr-x) | Team collaboration space | + +--- + +## 🛠️ Commands Used + +### Task 1: Create Users (with home directories and passwords) + +```bash +# Create users with home directories (-m flag) +sudo useradd -m tokyo +sudo useradd -m berlin +sudo useradd -m professor + +# Set passwords for each user +sudo passwd tokyo +sudo passwd berlin +sudo passwd professor +``` + +#### Verification: + +```bash +# Check /etc/passwd for the newly created users +cat /etc/passwd | grep -E "tokyo|berlin|professor" +``` + +**Expected Output:** +``` +tokyo:x:1001:1001::/home/tokyo:/bin/sh +berlin:x:1002:1002::/home/berlin:/bin/sh +professor:x:1003:1003::/home/professor:/bin/sh +``` + +```bash +# Verify home directories were created +ls -la /home/ | grep -E "tokyo|berlin|professor" +``` + +**Expected Output:** +``` +drwxr-xr-x 2 tokyo tokyo 4096 Feb 18 23:00 tokyo +drwxr-xr-x 2 berlin berlin 4096 Feb 18 23:00 berlin +drwxr-xr-x 2 professor professor 4096 Feb 18 23:00 professor +``` + +--- + +### Task 2: Create Groups + +```bash +# Create the developers and admins groups +sudo groupadd developers +sudo groupadd admins +``` + +#### Verification: + +```bash +# Check /etc/group for the newly created groups +cat /etc/group | grep -E "developers|admins" +``` + +**Expected Output:** +``` +developers:x:1004: +admins:x:1005: +``` + +--- + +### Task 3: Assign Users to Groups + +```bash +# Add tokyo to developers group +sudo usermod -aG developers tokyo + +# Add berlin to both developers AND admins groups +sudo usermod -aG developers berlin +sudo usermod -aG admins berlin + +# Add professor to admins group +sudo usermod -aG admins professor +``` + +> **⚠️ Important:** The `-aG` flags are critical here: +> - `-a` = **append** (without this, the user would be removed from all other supplementary groups) +> - `-G` = supplementary **groups** + +#### Verification: + +```bash +# Check group memberships for each user +groups tokyo +groups berlin +groups professor +``` + +**Expected Output:** +``` +tokyo : tokyo developers +berlin : berlin developers admins +professor : professor admins +``` + +```bash +# Alternative: Check from the group side +cat /etc/group | grep -E "developers|admins" +``` + +**Expected Output:** +``` +developers:x:1004:tokyo,berlin +admins:x:1005:berlin,professor +``` + +--- + +### Task 4: Shared Directory (`/opt/dev-project`) + +```bash +# Step 1: Create the shared directory +sudo mkdir -p /opt/dev-project + +# Step 2: Set group owner to developers +sudo chgrp developers /opt/dev-project + +# Step 3: Set permissions to 775 (rwxrwxr-x) +sudo chmod 775 /opt/dev-project +``` + +#### Verification: + +```bash +# Check the directory permissions and ownership +ls -ld /opt/dev-project +``` + +**Expected Output:** +``` +drwxrwxr-x 2 root developers 4096 Feb 18 23:05 /opt/dev-project +``` + +```bash +# Test file creation as tokyo (member of developers) +sudo -u tokyo touch /opt/dev-project/tokyo-file.txt + +# Test file creation as berlin (member of developers) +sudo -u berlin touch /opt/dev-project/berlin-file.txt + +# Verify the files were created +ls -la /opt/dev-project/ +``` + +**Expected Output:** +``` +total 8 +drwxrwxr-x 2 root developers 4096 Feb 18 23:06 . +drwxr-xr-x 4 root root 4096 Feb 18 23:05 .. +-rw-r--r-- 1 berlin berlin 0 Feb 18 23:06 berlin-file.txt +-rw-r--r-- 1 tokyo tokyo 0 Feb 18 23:06 tokyo-file.txt +``` + +--- + +### Task 5: Team Workspace + +```bash +# Step 1: Create user nairobi with home directory +sudo useradd -m nairobi +sudo passwd nairobi + +# Step 2: Create group project-team +sudo groupadd project-team + +# Step 3: Add nairobi and tokyo to project-team +sudo usermod -aG project-team nairobi +sudo usermod -aG project-team tokyo + +# Step 4: Create team workspace directory +sudo mkdir -p /opt/team-workspace + +# Step 5: Set group to project-team and permissions to 775 +sudo chgrp project-team /opt/team-workspace +sudo chmod 775 /opt/team-workspace +``` + +#### Verification: + +```bash +# Check directory permissions +ls -ld /opt/team-workspace +``` + +**Expected Output:** +``` +drwxrwxr-x 2 root project-team 4096 Feb 18 23:08 /opt/team-workspace +``` + +```bash +# Verify group memberships +groups nairobi +groups tokyo +``` + +**Expected Output:** +``` +nairobi : nairobi project-team +tokyo : tokyo developers project-team +``` + +```bash +# Test file creation as nairobi +sudo -u nairobi touch /opt/team-workspace/nairobi-file.txt + +# Verify the file was created +ls -la /opt/team-workspace/ +``` + +**Expected Output:** +``` +total 8 +drwxrwxr-x 2 root project-team 4096 Feb 18 23:09 . +drwxr-xr-x 5 root root 4096 Feb 18 23:08 .. +-rw-r--r-- 1 nairobi nairobi 0 Feb 18 23:09 nairobi-file.txt +``` + +--- + +## 📝 Summary of All Commands + +| # | Command | Purpose | +|---|---------|---------| +| 1 | `sudo useradd -m ` | Create a new user with a home directory | +| 2 | `sudo passwd ` | Set or update a user's password | +| 3 | `sudo groupadd ` | Create a new group | +| 4 | `sudo usermod -aG ` | Add a user to a supplementary group | +| 5 | `sudo mkdir -p ` | Create a directory (and any parent directories) | +| 6 | `sudo chgrp ` | Change the group owner of a directory | +| 7 | `sudo chmod 775 ` | Set permissions to rwxrwxr-x | +| 8 | `groups ` | Check which groups a user belongs to | +| 9 | `sudo -u ` | Run a command as a different user | +| 10| `ls -ld ` | Check directory permissions and ownership | +| 11| `cat /etc/passwd` | View user account information | +| 12| `cat /etc/group` | View group information | + +--- + +## 💡 What I Learned + +### 1. The Critical Importance of the `-a` Flag with `usermod -G` +Without the `-a` (append) flag, `usermod -G` **replaces** all existing supplementary groups. This is a common and dangerous mistake that can lock users out of critical groups. Always use `-aG` together when adding users to additional groups. + +### 2. Group-Based Access Control is Foundational to DevOps +In production environments, directories like `/opt/dev-project` mirror real-world scenarios such as: +- **Shared deployment directories** where multiple team members need write access +- **Log directories** accessible to monitoring tools but restricted from general users +- **CI/CD pipelines** where build agents need specific group permissions to deploy artifacts + +### 3. Permission Model: The Numeric (Octal) System +The `775` permission breaks down as: +- **7** (owner) = read (4) + write (2) + execute (1) = `rwx` +- **7** (group) = read (4) + write (2) + execute (1) = `rwx` +- **5** (others) = read (4) + execute (1) = `r-x` + +This ensures that the **owner** and **group members** have full access, while **others** can only read and traverse the directory — a balanced approach for team collaboration. + +--- + +## 🏗️ Real-World DevOps Use Cases + +| Concept | Real-World Application | +|---------|----------------------| +| User management | Creating service accounts for applications (e.g., `nginx`, `postgres`) | +| Group permissions | Controlling access to deployment directories, secrets, and configs | +| Shared directories | CI/CD artifact storage, shared logs, team project workspaces | +| Principle of least privilege | Ensuring users/services only have access to what they need | + +--- + +## 🔍 Troubleshooting Tips + +| Issue | Solution | +|-------|----------| +| Permission denied | Use `sudo` before the command | +| User can't access directory | Check group membership with `groups username` | +| Permission looks correct but access denied | User may need to **log out and back in** for group changes to take effect | +| Need to check directory permissions | Use `ls -ld /path/to/directory` | +| User accidentally removed from groups | Use `usermod -aG` (with `-a` flag!) to re-add | + +--- + diff --git a/2026/day-10/day-10-file-permissions.md b/2026/day-10/day-10-file-permissions.md new file mode 100644 index 0000000000..6beadd356b --- /dev/null +++ b/2026/day-10/day-10-file-permissions.md @@ -0,0 +1,492 @@ +# Day 10 – File Permissions & File Operations Challenge + +**Date:** 2026-02-18 +**Author:** Rameez Ahmed +**Challenge:** Master file permissions and basic file operations in Linux + +--- + +## 📋 Overview + +Today's challenge focused on **file permissions and file operations** — the backbone of Linux security. Every file and directory in Linux has an associated set of permissions that determine who can read, write, or execute it. Understanding this system is essential for DevOps engineers to secure servers, manage deployments, and troubleshoot access issues. + +--- + +## 📁 Files Created + +| File Name | Method Used | Purpose | +|----------------|-------------------|--------------------------------| +| `devops.txt` | `touch` | Empty file for permission practice | +| `notes.txt` | `echo` / `cat >` | File with content | +| `script.sh` | `vim` | Shell script for execution test | +| `project/` | `mkdir` | Directory with custom permissions | + +--- + +## 🛠️ Commands Used + +### Task 1: Create Files + +```bash +# 1. Create an empty file using touch +touch devops.txt + +# 2. Create notes.txt with some content using echo +echo "This is my DevOps learning journal - Day 10" > notes.txt + +# Alternative: Create notes.txt using cat with heredoc +cat > notes.txt << EOF +This is my DevOps learning journal - Day 10 +Learning about file permissions and operations +Linux permissions are critical for system security +EOF + +# 3. Create script.sh using vim with a simple script +vim script.sh +# Inside vim, type: +# echo "Hello DevOps" +# Save and exit with :wq +``` + +> **💡 Quick Tip:** You can also create `script.sh` without opening vim: +> ```bash +> echo 'echo "Hello DevOps"' > script.sh +> ``` + +#### Verification: + +```bash +ls -l devops.txt notes.txt script.sh +``` + +**Expected Output:** +``` +-rw-r--r-- 1 rameez rameez 0 Feb 18 23:30 devops.txt +-rw-r--r-- 1 rameez rameez 112 Feb 18 23:30 notes.txt +-rw-r--r-- 1 rameez rameez 21 Feb 18 23:30 script.sh +``` + +> **📌 Note:** By default, new files are created with `644` (`rw-r--r--`) permissions. The `umask` value (usually `022`) determines the default permissions. + +--- + +### Task 2: Read Files + +```bash +# 1. Read notes.txt using cat +cat notes.txt +``` + +**Expected Output:** +``` +This is my DevOps learning journal - Day 10 +Learning about file permissions and operations +Linux permissions are critical for system security +``` + +```bash +# 2. View script.sh in vim read-only mode +vim -R script.sh +# Or use the shortcut: +view script.sh +``` + +> **📌 Note:** In read-only mode, vim will warn you if you try to modify the file. Press `:q` to exit. + +```bash +# 3. Display first 5 lines of /etc/passwd using head +head -n 5 /etc/passwd +``` + +**Expected Output:** +``` +root:x:0:0:root:/root:/bin/bash +daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin +bin:x:2:2:bin:/bin:/usr/sbin/nologin +sys:x:3:3:sys:/dev:/usr/sbin/nologin +sync:x:4:65534:sync:/bin:/bin/sync +``` + +```bash +# 4. Display last 5 lines of /etc/passwd using tail +tail -n 5 /etc/passwd +``` + +**Expected Output:** +``` +(last 5 users on the system will be displayed here) +``` + +--- + +### Task 3: Understand Permissions + +#### The Linux Permission Model + +Every file/directory in Linux has three types of permissions for three categories of users: + +``` + - rwx rwx rwx + │ │ │ │ + │ │ │ └── Others (everyone else) + │ │ └─────── Group (users in the file's group) + │ └──────────── Owner (the file creator) + └──────────────── File type (- = file, d = directory, l = symlink) +``` + +#### Permission Values (Octal) + +| Symbol | Value | Meaning | +|--------|-------|-------------------------------| +| `r` | 4 | **Read** – View file content / list directory | +| `w` | 2 | **Write** – Modify file / add/remove files in directory | +| `x` | 1 | **Execute** – Run file as program / enter directory | +| `-` | 0 | **No permission** | + +#### Common Permission Combinations + +| Octal | Symbolic | Meaning | +|-------|-------------|-----------------------------------| +| `777` | `rwxrwxrwx` | Full access for everyone | +| `755` | `rwxr-xr-x` | Owner: full, Others: read+execute | +| `644` | `rw-r--r--` | Owner: read+write, Others: read | +| `640` | `rw-r-----` | Owner: read+write, Group: read | +| `600` | `rw-------` | Owner only: read+write | +| `444` | `r--r--r--` | Read-only for everyone | +| `000` | `----------`| No access for anyone | + +#### Check Current Permissions: + +```bash +ls -l devops.txt notes.txt script.sh +``` + +**Expected Output (Default):** +``` +-rw-r--r-- 1 rameez rameez 0 Feb 18 23:30 devops.txt +-rw-r--r-- 1 rameez rameez 112 Feb 18 23:30 notes.txt +-rw-r--r-- 1 rameez rameez 21 Feb 18 23:30 script.sh +``` + +#### Analysis of Default Permissions (`644` / `rw-r--r--`): + +| File | Owner (rameez) | Group (rameez) | Others | +|--------------|----------------|----------------|--------| +| `devops.txt` | Read + Write | Read only | Read only | +| `notes.txt` | Read + Write | Read only | Read only | +| `script.sh` | Read + Write | Read only | Read only | + +> **⚠️ Key Observation:** None of the files have **execute** (`x`) permission by default! This means `script.sh` **cannot** be run directly with `./script.sh` until we explicitly add execute permission. + +--- + +### Task 4: Modify Permissions + +#### 4.1 – Make `script.sh` Executable + +```bash +# Before: Check current permissions +ls -l script.sh +``` +``` +-rw-r--r-- 1 rameez rameez 21 Feb 18 23:30 script.sh +``` + +```bash +# Add execute permission for the owner +chmod +x script.sh + +# After: Verify the change +ls -l script.sh +``` +``` +-rwxr-xr-x 1 rameez rameez 21 Feb 18 23:30 script.sh +``` + +```bash +# Now run the script! +./script.sh +``` +``` +Hello DevOps +``` + +> **📌 Note:** `chmod +x` adds execute permission for **all** (owner, group, others). To add only for the owner, use `chmod u+x script.sh`. + +--- + +#### 4.2 – Set `devops.txt` to Read-Only + +```bash +# Before: Check current permissions +ls -l devops.txt +``` +``` +-rw-r--r-- 1 rameez rameez 0 Feb 18 23:30 devops.txt +``` + +```bash +# Remove write permission for ALL users (owner, group, others) +chmod a-w devops.txt + +# Alternatively, use octal notation: +# chmod 444 devops.txt + +# After: Verify the change +ls -l devops.txt +``` +``` +-r--r--r-- 1 rameez rameez 0 Feb 18 23:30 devops.txt +``` + +--- + +#### 4.3 – Set `notes.txt` to `640` + +```bash +# Before: Check current permissions +ls -l notes.txt +``` +``` +-rw-r--r-- 1 rameez rameez 112 Feb 18 23:30 notes.txt +``` + +```bash +# Set permissions to 640 (owner: rw, group: r, others: none) +chmod 640 notes.txt + +# After: Verify the change +ls -l notes.txt +``` +``` +-rw-r----- 1 rameez rameez 112 Feb 18 23:30 notes.txt +``` + +**Breakdown of `640`:** +- **6** (owner) = read (4) + write (2) = `rw-` +- **4** (group) = read (4) = `r--` +- **0** (others) = no permissions = `---` + +--- + +#### 4.4 – Create Directory `project/` with Permissions `755` + +```bash +# Create the directory +mkdir project + +# Set permissions to 755 +chmod 755 project + +# Alternatively, create and set permissions in one logical step +mkdir -m 755 project + +# Verify the directory permissions +ls -ld project/ +``` +``` +drwxr-xr-x 2 rameez rameez 4096 Feb 18 23:35 project/ +``` + +**Breakdown of `755`:** +- **7** (owner) = read (4) + write (2) + execute (1) = `rwx` +- **5** (group) = read (4) + execute (1) = `r-x` +- **5** (others) = read (4) + execute (1) = `r-x` + +> **📌 For directories:** The `x` (execute) permission means the ability to **enter** (`cd`) into the directory. Without it, users cannot access the directory even if they have read permission. + +--- + +### Task 5: Test Permissions + +#### 5.1 – Writing to a Read-Only File + +```bash +# Try to write to the read-only devops.txt +echo "test" >> devops.txt +``` + +**Error Message:** +``` +bash: devops.txt: Permission denied +``` + +> **💡 Explanation:** The write permission was removed for all users (including the owner). The shell prevents any modification to the file. + +```bash +# Even redirecting output fails +echo "test" > devops.txt +``` + +**Error Message:** +``` +bash: devops.txt: Permission denied +``` + +> **⚠️ However:** The file owner can still restore write permission with `chmod u+w devops.txt` and then write to it. Root can also override this restriction. + +--- + +#### 5.2 – Executing a File Without Execute Permission + +```bash +# First, remove execute permission from script.sh +chmod -x script.sh + +# Try to execute it +./script.sh +``` + +**Error Message:** +``` +bash: ./script.sh: Permission denied +``` + +> **💡 Workaround:** Even without execute permission, the script can still be run using the interpreter directly: +> ```bash +> bash script.sh +> ``` +> This works because `bash` reads the file (it has read permission) and interprets it — the execute bit isn't checked on the file itself in this case. + +--- + +#### 5.3 – Accessing a File with No Permissions + +```bash +# Remove ALL permissions from a test file +chmod 000 devops.txt + +# Try to read it +cat devops.txt +``` + +**Error Message:** +``` +cat: devops.txt: Permission denied +``` + +```bash +# Restore permissions +chmod 644 devops.txt +``` + +--- + +## 📊 Permission Changes Summary + +| File/Directory | Before (Default) | After | Command Used | +|----------------|-------------------|-------|-------------| +| `script.sh` | `-rw-r--r--` (644) | `-rwxr-xr-x` (755) | `chmod +x script.sh` | +| `devops.txt` | `-rw-r--r--` (644) | `-r--r--r--` (444) | `chmod a-w devops.txt` | +| `notes.txt` | `-rw-r--r--` (644) | `-rw-r-----` (640) | `chmod 640 notes.txt` | +| `project/` | `drwxr-xr-x` (755) | `drwxr-xr-x` (755) | `mkdir -m 755 project` | + +--- + +## 🔄 Symbolic vs Numeric (Octal) chmod + +There are two ways to use `chmod`: + +### Symbolic Mode + +```bash +chmod u+x file # Add execute for owner +chmod g-w file # Remove write for group +chmod o=r file # Set others to read-only +chmod a+x file # Add execute for all (a = all) +chmod u=rwx,g=rx,o=r file # Set specific permissions +``` + +| Symbol | Meaning | +|--------|---------| +| `u` | User (owner) | +| `g` | Group | +| `o` | Others | +| `a` | All (user + group + others) | +| `+` | Add permission | +| `-` | Remove permission | +| `=` | Set exact permission | + +### Numeric (Octal) Mode + +```bash +chmod 755 file # rwxr-xr-x +chmod 644 file # rw-r--r-- +chmod 600 file # rw------- +``` + +> **💡 Pro Tip:** Symbolic mode is better for **relative changes** (adding/removing specific permissions), while octal mode is better for **setting exact permissions** in one command. + +--- + +## 📝 Summary of All Commands + +| # | Command | Purpose | +|---|---------|---------| +| 1 | `touch ` | Create an empty file | +| 2 | `echo "text" > file` | Create a file with content (overwrites) | +| 3 | `echo "text" >> file` | Append text to a file | +| 4 | `cat > file` | Create file with interactive input (Ctrl+D to save) | +| 5 | `cat file` | Display file contents | +| 6 | `vim file` | Open file in vim editor | +| 7 | `vim -R file` / `view file` | Open file in read-only mode | +| 8 | `head -n N file` | Display first N lines of a file | +| 9 | `tail -n N file` | Display last N lines of a file | +| 10 | `ls -l` | List files with detailed permissions | +| 11 | `ls -ld ` | Check directory permissions specifically | +| 12 | `chmod +x file` | Add execute permission | +| 13 | `chmod -w file` | Remove write permission | +| 14 | `chmod 755 file` | Set exact permissions using octal notation | +| 15 | `mkdir -m 755 dir` | Create directory with specific permissions | + +--- + +## 💡 What I Learned + +### 1. Default Permissions and `umask` +New files are created with `666` (rw-rw-rw-) and directories with `777` (rwxrwxrwx) as the base. The system's `umask` value (typically `022`) is then **subtracted** to give the actual permissions: +- Files: `666 - 022 = 644` (rw-r--r--) +- Directories: `777 - 022 = 755` (rwxr-xr-x) + +This is why new files are never created with execute permission by default — it's a security measure! + +### 2. The Execute Bit Means Different Things for Files and Directories +- **For files:** `x` means the file can be run as a program or script +- **For directories:** `x` means you can **enter** the directory with `cd` and access its contents + +Without `x` on a directory, even listing (`ls`) its contents is not possible for that user, even if `r` is set. This is a subtle but critical distinction! + +### 3. Permission Denied ≠ Impossible +As a file owner, you can always change permissions back (using `chmod`). And `root` can override virtually all permission restrictions. In a real DevOps environment, this means: +- **Never rely solely on file permissions** for critical security — use additional layers (SELinux, AppArmor, ACLs) +- **Service accounts** should run with minimal permissions +- **Sensitive files** (SSH keys, credentials) should be `600` (owner read/write only) + +--- + +## 🏗️ Real-World DevOps Use Cases + +| Scenario | Recommended Permission | Why | +|----------|----------------------|-----| +| SSH private key (`~/.ssh/id_rsa`) | `600` | Only the owner should read the key | +| SSH public key (`~/.ssh/id_rsa.pub`) | `644` | Everyone can read, only owner writes | +| Web server files (`/var/www/html/`) | `644` (files), `755` (dirs) | Web server reads, owner manages | +| Shell scripts in CI/CD | `755` | Must be executable by automation | +| Application config with secrets | `600` or `640` | Restrict access to sensitive data | +| Log directories | `755` | Readable for monitoring tools | +| `.env` files | `600` | Secrets must be owner-only | + +--- + +## 🔍 Troubleshooting Tips + +| Issue | Diagnosis | Solution | +|-------|-----------|----------| +| `Permission denied` when running script | Missing `x` permission | `chmod +x script.sh` | +| `Permission denied` when writing to file | Missing `w` permission | `chmod u+w file` | +| Can't `cd` into a directory | Missing `x` on directory | `chmod +x directory/` | +| SSH key rejected | Permissions too open | `chmod 600 ~/.ssh/id_rsa` | +| Web page returns 403 Forbidden | Wrong permissions on web root | `chmod 644` files, `755` dirs | +| `umask` giving unexpected defaults | Check current umask | Run `umask` to see current value | + +--- diff --git a/2026/day-11/day-11-file-ownership.md b/2026/day-11/day-11-file-ownership.md new file mode 100644 index 0000000000..525cd93b38 --- /dev/null +++ b/2026/day-11/day-11-file-ownership.md @@ -0,0 +1,443 @@ +# Day 11 – File Ownership Challenge (chown & chgrp) + +**Date:** 2026-02-18 +**Author:** Rameez Ahmed +**Challenge:** Master file and directory ownership in Linux using `chown` and `chgrp` + +--- + +## 📋 Overview + +Today's challenge focused on **file ownership** in Linux — understanding how every file and directory has both a **user owner** and a **group owner**, and how to change them using `chown` and `chgrp`. Proper ownership management is essential for DevOps engineers managing application deployments, shared team directories, container permissions, and CI/CD pipelines. + +--- + +## 📁 Files & Directories Created + +### Files + +| File Name | Location | Purpose | +|------------------------|------------------------------------|-----------------------------------------| +| `devops-file.txt` | Working directory | Practice basic `chown` operations | +| `team-notes.txt` | Working directory | Practice basic `chgrp` operations | +| `project-config.yaml` | Working directory | Combined owner & group change practice | +| `gold.txt` | `heist-project/vault/` | Recursive ownership testing | +| `strategy.conf` | `heist-project/plans/` | Recursive ownership testing | +| `access-codes.txt` | `bank-heist/` | Final challenge — individual ownership | +| `blueprints.pdf` | `bank-heist/` | Final challenge — individual ownership | +| `escape-plan.txt` | `bank-heist/` | Final challenge — individual ownership | + +### Directories + +| Directory Name | Purpose | +|--------------------------|-------------------------------------------| +| `app-logs/` | Directory ownership change practice | +| `heist-project/` | Recursive ownership change (root dir) | +| `heist-project/vault/` | Sub-directory for recursive testing | +| `heist-project/plans/` | Sub-directory for recursive testing | +| `bank-heist/` | Final challenge workspace | + +--- + +## 🛠️ Commands Used + +### Task 1: Understanding Ownership + +Every file in Linux has **two ownership attributes**: +1. **User Owner** — The user who owns the file +2. **Group Owner** — The group associated with the file + +```bash +# View ownership details of files in home directory +ls -l ~ +``` + +**Output Format:** +``` +-rw-r--r-- 1 owner group size date filename +│ │ │ +│ │ └── Group owner +│ └── User owner (file creator) +└── Permissions +``` + +#### Owner vs Group — What's the Difference? + +| Attribute | Description | Example | +|-----------|-------------|---------| +| **Owner** | A single user who "owns" the file. Usually the creator. Has primary control over the file. | `rameez` | +| **Group** | A group of users who share access. Multiple users can belong to the same group. | `developers` | + +> **💡 Key Insight:** The owner can change permissions on the file, while group membership determines which users share the group-level permissions. A user doesn't need to be the owner to have access — they just need to be in the right group. + +--- + +### Task 2: Basic `chown` Operations + +```bash +# Step 1: Create the file +touch devops-file.txt + +# Step 2: Check current ownership +ls -l devops-file.txt +``` + +**Output (Before):** +``` +-rw-r--r-- 1 rameez rameez 0 Feb 18 23:35 devops-file.txt +``` + +```bash +# Step 3: Change owner to tokyo +sudo chown tokyo devops-file.txt + +# Verify the change +ls -l devops-file.txt +``` + +**Output (After chown to tokyo):** +``` +-rw-r--r-- 1 tokyo rameez 0 Feb 18 23:35 devops-file.txt +``` + +```bash +# Step 4: Change owner to berlin +sudo chown berlin devops-file.txt + +# Step 5: Verify the change +ls -l devops-file.txt +``` + +**Output (After chown to berlin):** +``` +-rw-r--r-- 1 berlin rameez 0 Feb 18 23:35 devops-file.txt +``` + +> **📌 Note:** Only `root` (via `sudo`) can change file ownership. A regular user cannot give away their files to another user — this is a security feature to prevent abuse. + +--- + +### Task 3: Basic `chgrp` Operations + +```bash +# Step 1: Create the file +touch team-notes.txt + +# Step 2: Check current group +ls -l team-notes.txt +``` + +**Output (Before):** +``` +-rw-r--r-- 1 rameez rameez 0 Feb 18 23:36 team-notes.txt +``` + +```bash +# Step 3: Create the heist-team group +sudo groupadd heist-team + +# Step 4: Change file group to heist-team +sudo chgrp heist-team team-notes.txt + +# Step 5: Verify the change +ls -l team-notes.txt +``` + +**Output (After chgrp):** +``` +-rw-r--r-- 1 rameez heist-team 0 Feb 18 23:36 team-notes.txt +``` + +> **📌 `chgrp` vs `chown :group`:** Both commands can change the group of a file: +> - `sudo chgrp heist-team file.txt` — dedicated group change command +> - `sudo chown :heist-team file.txt` — using chown with `:group` syntax (owner stays unchanged) +> +> They are functionally equivalent for group changes. + +--- + +### Task 4: Combined Owner & Group Change + +The `chown` command supports changing **both** owner and group in a **single command** using the `owner:group` syntax. + +```bash +# Step 1: Create project-config.yaml +touch project-config.yaml + +# Check current ownership +ls -l project-config.yaml +``` + +**Output (Before):** +``` +-rw-r--r-- 1 rameez rameez 0 Feb 18 23:37 project-config.yaml +``` + +```bash +# Step 2: Change BOTH owner to professor AND group to heist-team (one command!) +sudo chown professor:heist-team project-config.yaml + +# Verify the change +ls -l project-config.yaml +``` + +**Output (After):** +``` +-rw-r--r-- 1 professor heist-team 0 Feb 18 23:37 project-config.yaml +``` + +```bash +# Step 3: Create app-logs directory +mkdir app-logs + +# Step 4: Change directory owner to berlin and group to heist-team +sudo chown berlin:heist-team app-logs/ + +# Verify the directory change +ls -ld app-logs/ +``` + +**Output (After):** +``` +drwxr-xr-x 2 berlin heist-team 4096 Feb 18 23:38 app-logs/ +``` + +#### `chown` Syntax Variations + +| Syntax | Effect | +|--------|--------| +| `chown user file` | Change **owner only** | +| `chown :group file` | Change **group only** (note the colon) | +| `chown user:group file` | Change **both owner and group** | +| `chown user: file` | Change owner and set group to user's **login group** | + +--- + +### Task 5: Recursive Ownership + +The `-R` (recursive) flag applies ownership changes to a directory **and all of its contents** (files and subdirectories). + +```bash +# Step 1: Create the directory structure +mkdir -p heist-project/vault +mkdir -p heist-project/plans +touch heist-project/vault/gold.txt +touch heist-project/plans/strategy.conf + +# Verify the structure +ls -lR heist-project/ +``` + +**Output (Before — all owned by rameez):** +``` +heist-project/: +total 8 +drwxr-xr-x 2 rameez rameez 4096 Feb 18 23:39 plans +drwxr-xr-x 2 rameez rameez 4096 Feb 18 23:39 vault + +heist-project/plans: +total 0 +-rw-r--r-- 1 rameez rameez 0 Feb 18 23:39 strategy.conf + +heist-project/vault: +total 0 +-rw-r--r-- 1 rameez rameez 0 Feb 18 23:39 gold.txt +``` + +```bash +# Step 2: Create the planners group +sudo groupadd planners + +# Step 3: Change ownership RECURSIVELY +sudo chown -R professor:planners heist-project/ + +# Step 4: Verify ALL files and subdirectories changed +ls -lR heist-project/ +``` + +**Output (After — ALL owned by professor:planners):** +``` +heist-project/: +total 8 +drwxr-xr-x 2 professor planners 4096 Feb 18 23:39 plans +drwxr-xr-x 2 professor planners 4096 Feb 18 23:39 vault + +heist-project/plans: +total 0 +-rw-r--r-- 1 professor planners 0 Feb 18 23:39 strategy.conf + +heist-project/vault: +total 0 +-rw-r--r-- 1 professor planners 0 Feb 18 23:39 gold.txt +``` + +> **✅ Key Point:** The `-R` flag changed ownership on: +> - The `heist-project/` directory itself +> - Both subdirectories (`vault/` and `plans/`) +> - Both files inside (`gold.txt` and `strategy.conf`) +> +> **All in a single command!** + +> **⚠️ Warning:** Be extremely careful with `chown -R` on system directories (like `/`, `/etc`, `/var`). A misapplied recursive ownership change can **break your entire system**. + +--- + +### Task 6: Practice Challenge + +```bash +# Step 1: Create users (if not already created from Day 09) +sudo useradd -m tokyo 2>/dev/null +sudo useradd -m berlin 2>/dev/null +sudo useradd -m nairobi 2>/dev/null + +# Step 2: Create groups +sudo groupadd vault-team +sudo groupadd tech-team + +# Step 3: Create the bank-heist directory +mkdir bank-heist + +# Step 4: Create files inside bank-heist +touch bank-heist/access-codes.txt +touch bank-heist/blueprints.pdf +touch bank-heist/escape-plan.txt +``` + +**Verify (Before):** +```bash +ls -l bank-heist/ +``` +``` +total 0 +-rw-r--r-- 1 rameez rameez 0 Feb 18 23:40 access-codes.txt +-rw-r--r-- 1 rameez rameez 0 Feb 18 23:40 blueprints.pdf +-rw-r--r-- 1 rameez rameez 0 Feb 18 23:40 escape-plan.txt +``` + +```bash +# Step 5: Set different ownership for each file +sudo chown tokyo:vault-team bank-heist/access-codes.txt +sudo chown berlin:tech-team bank-heist/blueprints.pdf +sudo chown nairobi:vault-team bank-heist/escape-plan.txt +``` + +**Verify (After):** +```bash +ls -l bank-heist/ +``` +``` +total 0 +-rw-r--r-- 1 tokyo vault-team 0 Feb 18 23:40 access-codes.txt +-rw-r--r-- 1 berlin tech-team 0 Feb 18 23:40 blueprints.pdf +-rw-r--r-- 1 nairobi vault-team 0 Feb 18 23:40 escape-plan.txt +``` + +> **✅ Each file now has a different owner and group, demonstrating granular ownership control!** + +--- + +## 🔄 Ownership Changes Summary + +| File/Directory | Before (owner:group) | After (owner:group) | Command | +|----------------|---------------------|---------------------|---------| +| `devops-file.txt` | `rameez:rameez` | `berlin:rameez` | `sudo chown berlin devops-file.txt` | +| `team-notes.txt` | `rameez:rameez` | `rameez:heist-team` | `sudo chgrp heist-team team-notes.txt` | +| `project-config.yaml` | `rameez:rameez` | `professor:heist-team` | `sudo chown professor:heist-team project-config.yaml` | +| `app-logs/` | `rameez:rameez` | `berlin:heist-team` | `sudo chown berlin:heist-team app-logs/` | +| `heist-project/` (recursive) | `rameez:rameez` | `professor:planners` | `sudo chown -R professor:planners heist-project/` | +| `access-codes.txt` | `rameez:rameez` | `tokyo:vault-team` | `sudo chown tokyo:vault-team bank-heist/access-codes.txt` | +| `blueprints.pdf` | `rameez:rameez` | `berlin:tech-team` | `sudo chown berlin:tech-team bank-heist/blueprints.pdf` | +| `escape-plan.txt` | `rameez:rameez` | `nairobi:vault-team` | `sudo chown nairobi:vault-team bank-heist/escape-plan.txt` | + +--- + +## 📝 Summary of All Commands + +| # | Command | Purpose | +|---|---------|---------| +| 1 | `ls -l ` | View file ownership (owner and group) | +| 2 | `ls -ld ` | View directory ownership | +| 3 | `ls -lR ` | View ownership recursively for all contents | +| 4 | `sudo chown ` | Change file owner only | +| 5 | `sudo chown : ` | Change file group only (using chown) | +| 6 | `sudo chown : ` | Change both owner and group at once | +| 7 | `sudo chown : ` | Change owner and set group to user's login group | +| 8 | `sudo chgrp ` | Change file group (dedicated command) | +| 9 | `sudo chown -R : ` | Recursively change ownership of dir and all contents | +| 10 | `sudo chgrp -R ` | Recursively change group of dir and all contents | +| 11 | `sudo useradd -m ` | Create a new user (prerequisite for chown) | +| 12 | `sudo groupadd ` | Create a new group (prerequisite for chgrp) | + +--- + +## 🧠 `chown` vs `chgrp` — When to Use Which? + +| Feature | `chown` | `chgrp` | +| -------------------------- | --------------------- | --------- | +| Change owner | ✅ Yes | ❌ No | +| Change group | ✅ Yes (with `:group`) | ✅ Yes | +| Change both simultaneously | ✅ Yes (`user:group`) | ❌ No | +| Recursive (`-R`) | ✅ Yes | ✅ Yes | +| Requires `sudo` | ✅ Usually | ✅ Usually | + +> **💡 Recommendation:** Use `chown` for most scenarios since it can do everything `chgrp` can plus more. Use `chgrp` when you explicitly want to **only** change the group and want your command to be self-documenting. + +--- + +## 💡 What I Learned + +### 1. Ownership is Separate from Permissions +Ownership (who owns the file) and permissions (what actions are allowed) are **two distinct systems** that work together: +- **Ownership** determines which permission set applies to a user (owner, group, or others) +- **Permissions** determine what actions are allowed for each category + +For example, a file owned by `tokyo:developers` with permissions `640`: +- `tokyo` (owner) → gets `rw-` (read + write) +- Users in `developers` group → get `r--` (read only) +- Everyone else → gets `---` (no access) + +### 2. The `-R` Flag is Powerful but Dangerous +Recursive ownership changes (`chown -R`) affect **every single file and directory** within the target. This is incredibly useful for: +- Setting up new application deployments +- Fixing broken permissions after a migration +- Provisioning shared project directories + +But it can also be **destructive** if applied to wrong directories. Always double-check the path before running `chown -R`, especially with `sudo`. + +### 3. Why `sudo` is Required for `chown` +Linux prevents regular users from changing file ownership as a **security measure**: +- **Prevents quota abuse** — A user could create large files and "give" them to another user, consuming their disk quota +- **Prevents privilege escalation** — A user could create a setuid program and assign it to root +- **Maintains accountability** — File ownership provides an audit trail of who created what + +Only `root` (via `sudo`) can change ownership — this is by design, not a limitation! + +--- + +## 🏗️ Real-World DevOps Use Cases + +| Scenario | Command Example | Why It Matters | +|----------|----------------|----------------| +| **Application deployment** | `sudo chown -R www-data:www-data /var/www/app/` | Web server needs ownership of app files | +| **Shared team directory** | `sudo chown :dev-team /opt/project/` | Team members need group access to project files | +| **Container file mounts** | `sudo chown -R 1000:1000 /data/volume/` | Containers often run as specific UIDs | +| **CI/CD artifact directory** | `sudo chown jenkins:jenkins /var/lib/jenkins/` | Build agent needs ownership of its workspace | +| **Log file management** | `sudo chown syslog:adm /var/log/app.log` | Log collector needs appropriate ownership | +| **Database data directory** | `sudo chown -R postgres:postgres /var/lib/postgresql/` | Database must own its data files | +| **SSH key setup** | `sudo chown user:user ~/.ssh/authorized_keys` | SSH daemon enforces strict ownership checks | + +--- + +## 🔍 Troubleshooting Tips + +| Issue | Cause | Solution | +|-------|-------|----------| +| `chown: invalid user` | User doesn't exist | Create user first: `sudo useradd ` | +| `chgrp: invalid group` | Group doesn't exist | Create group first: `sudo groupadd ` | +| `Operation not permitted` | Not using `sudo` | Prefix with `sudo`: `sudo chown ...` | +| Ownership didn't change on subdirectories | Forgot `-R` flag | Use recursive: `sudo chown -R user:group dir/` | +| App can't read files after chown | Wrong user/group | Verify the app runs as the correct user with `ps aux` | +| Container permission issues | UID mismatch | Match container UID with `chown : dir/` | + +--- + diff --git a/2026/day-12/day-12-revision.md b/2026/day-12/day-12-revision.md new file mode 100644 index 0000000000..c7c5eb5f70 --- /dev/null +++ b/2026/day-12/day-12-revision.md @@ -0,0 +1,288 @@ +# Day 12 – Breather & Revision (Days 01–11) + +**Date:** 2026-02-18 +**Author:** Rameez Ahmed +**Goal:** Consolidate and reinforce fundamentals from Days 01–11 + +--- + +## 📋 Overview + +Today is a **revision day** — no new concepts, just strengthening the foundation built over the last 11 days. This is about **retention**, not rushing ahead. A strong DevOps engineer revisits fundamentals regularly because these are the commands and concepts you'll reach for first in production incidents. + +--- + +## 🔁 Review Summary (Days 01–11) + +### Day-by-Day Recap + +| Day | Topic | Key Takeaway | +| ---------- | ---------------------------- | -------------------------------------------------------------------- | +| **Day 01** | DevOps Intro & Learning Plan | DevOps = Culture + Automation + CI/CD + Monitoring | +| **Day 02** | Linux Basics & OS Overview | Linux is the backbone of most production systems | +| **Day 03** | Essential Linux Commands | Navigation, file ops, and pipeline commands are daily tools | +| **Day 04** | Process Management | `ps`, `top`, `kill` — understanding what's running on the system | +| **Day 05** | Services & Systemctl | `systemctl` is the gateway to managing Linux services | +| **Day 06** | File System & Navigation | Understanding the Linux directory hierarchy (`/etc`, `/var`, `/opt`) | +| **Day 07** | Package Management | `apt`/`yum` — installing, updating, and removing software | +| **Day 08** | Nginx Deployment on AWS | Real-world cloud deployment with EC2 and web server setup | +| **Day 09** | User & Group Management | `useradd`, `groupadd`, `usermod -aG` — controlling access | +| **Day 10** | File Permissions | `chmod`, the `rwx` model, octal notation (644, 755, etc.) | +| **Day 11** | File Ownership | `chown`, `chgrp`, recursive ownership with `-R` | + +--- + +## 🎯 Mindset & Plan Revisit + +### Original Goals (Day 01) +- ✅ Learn Linux fundamentals — **Completed (Days 02–11)** +- ✅ Understand cloud basics — **Started (Day 08 — AWS EC2 + Nginx)** +- 🔄 Build automation skills — **Coming up in the next phase** +- 🔄 Master CI/CD pipelines — **Planned for later days** + +### Tweaks to the Plan +- **Spend more time on scripting** — Bash scripting will tie together all the commands learned so far +- **Practice troubleshooting scenarios** — Not just running commands, but diagnosing real problems +- **Focus on networking next** — Understanding ports, firewalls, and connectivity is essential for DevOps + +--- + +## 🔧 Hands-On Re-runs + +### 1. Processes & Services (Days 04–05) + +```bash +# Check running processes +ps aux | head -15 +``` + +**Observation:** The system is running essential services like `sshd`, `systemd`, and user processes. The `ps aux` output shows CPU/memory usage — useful for identifying resource-hungry processes. + +```bash +# Check the status of a service +systemctl status sshd +``` + +**Observation:** The SSH daemon is active and running, showing its PID, memory usage, and recent log entries. This is the first command to run when checking if a service is healthy. + +```bash +# View recent logs for a service +journalctl -u sshd --no-pager -n 10 +``` + +**Observation:** `journalctl` provides timestamped log entries, which is crucial for debugging connection issues or authentication failures. + +--- + +### 2. File Skills Practice (Days 06–11) + +```bash +# Create a test file and append content +echo "Revision day practice" > revision-test.txt +echo "Adding more content" >> revision-test.txt +cat revision-test.txt +``` + +```bash +# Set specific permissions +chmod 640 revision-test.txt +ls -l revision-test.txt +``` + +**Output:** +``` +-rw-r----- 1 rameez rameez 41 Feb 18 23:40 revision-test.txt +``` + +```bash +# Create a directory with specific permissions +mkdir -m 755 revision-project +ls -ld revision-project/ +``` + +**Output:** +``` +drwxr-xr-x 2 rameez rameez 4096 Feb 18 23:40 revision-project/ +``` + +--- + +### 3. User/Group Sanity Check (Days 09 & 11) + +```bash +# Recreate a small scenario: create user and verify +sudo useradd -m testuser-revision +id testuser-revision +``` + +**Expected Output:** +``` +uid=1005(testuser-revision) gid=1005(testuser-revision) groups=1005(testuser-revision) +``` + +```bash +# Create a group and add the user +sudo groupadd revision-team +sudo usermod -aG revision-team testuser-revision + +# Verify group membership +groups testuser-revision +``` + +**Expected Output:** +``` +testuser-revision : testuser-revision revision-team +``` + +```bash +# Change file ownership and verify +touch ownership-test.txt +sudo chown testuser-revision:revision-team ownership-test.txt +ls -l ownership-test.txt +``` + +**Expected Output:** +``` +-rw-r--r-- 1 testuser-revision revision-team 0 Feb 18 23:41 ownership-test.txt +``` + +```bash +# Cleanup +sudo userdel -r testuser-revision +sudo groupdel revision-team +rm ownership-test.txt +``` + +--- + +### 4. Cheat Sheet Refresh (Day 03) + +#### 🔥 Top 5 Commands I'd Reach for First in an Incident + +| # | Command | Why It's First | +|---|---------|----------------| +| 1 | `systemctl status ` | Instantly tells if a service is running, failed, or inactive | +| 2 | `journalctl -u -n 50 --no-pager` | Shows recent logs to diagnose WHY something failed | +| 3 | `ps aux \| grep ` | Finds if a specific process is running and its resource usage | +| 4 | `df -h` | Checks disk space — full disks cause silent failures everywhere | +| 5 | `tail -f /var/log/syslog` | Live-stream system logs to see errors as they happen | + +> **💡 Incident Response Order:** Check service → Read logs → Check processes → Check resources → Check network + +--- + +## ✅ Mini Self-Check + +### 1) Which 3 commands save you the most time right now, and why? + +| Command | Why It Saves Time | +|---------|-------------------| +| **`systemctl status `** | One command gives you running state, PID, memory, and recent logs — replaces 3-4 separate checks | +| **`chmod 755 `** | Octal notation sets exact permissions in one shot instead of multiple `u+x`, `g+r`, `o+r` calls | +| **`chown -R user:group dir/`** | Recursively fixes ownership on entire directory trees in a single command — would take dozens of individual commands otherwise | + +--- + +### 2) How do you check if a service is healthy? List the exact 2–3 commands you'd run first. + +```bash +# Command 1: Check if the service is active and running +systemctl status nginx + +# Command 2: Check recent service logs for errors +journalctl -u nginx -n 20 --no-pager + +# Command 3: Verify the process is actually listening on the expected port +ss -tlnp | grep nginx +``` + +**What to look for:** +- `systemctl status` → Should show `active (running)` in green +- `journalctl` → Should have no `ERROR` or `FATAL` entries +- `ss -tlnp` → Should show the service listening on the expected port (e.g., `0.0.0.0:80`) + +--- + +### 3) How do you safely change ownership and permissions without breaking access? Give one example command. + +**Safe approach — always verify before and after:** + +```bash +# Step 1: Check CURRENT permissions (before making changes) +ls -la /opt/app-directory/ + +# Step 2: Change ownership (use -R carefully, double-check the path!) +sudo chown -R www-data:www-data /opt/app-directory/ + +# Step 3: Set permissions — files get 644, directories get 755 +sudo find /opt/app-directory/ -type f -exec chmod 644 {} \; +sudo find /opt/app-directory/ -type d -exec chmod 755 {} \; + +# Step 4: VERIFY the changes look correct +ls -la /opt/app-directory/ +``` + +> **🔑 Key Safety Rule:** Never blindly run `chown -R` or `chmod -R` without first confirming the path. A typo like `chown -R user:group /` (root!) vs `chown -R user:group ./` (current dir) can destroy the system. + +--- + +### 4) What will you focus on improving in the next 3 days? + +| Focus Area | Why | Plan | +|------------|-----|------| +| **Shell Scripting** | Automate repetitive tasks instead of manual commands | Write scripts that combine user management + permissions | +| **Networking Fundamentals** | Understanding ports, DNS, and firewalls is critical for deployments | Learn `ss`, `netstat`, `iptables`, `curl` diagnostics | +| **Real Troubleshooting** | Move from "knowing commands" to "diagnosing problems" | Practice scenarios: "service down", "disk full", "permission denied" | + +--- + +## 🧠 Key Takeaways from Days 01–11 + +### The Big Picture + +``` + ┌──────────────────────────────────┐ + │ DevOps Foundation │ + │ (Days 01-11) │ + └──────────────┬───────────────────┘ + │ + ┌────────────────────┼────────────────────┐ + │ │ │ + ┌─────┴─────┐ ┌──────┴──────┐ ┌─────┴─────┐ + │ System │ │ Access │ │ Cloud │ + │ Admin │ │ Control │ │ Deploy │ + │ │ │ │ │ │ + │ • Commands │ │ • Users │ │ • AWS EC2 │ + │ • Processes│ │ • Groups │ │ • Nginx │ + │ • Services │ │ • Perms │ │ • Security│ + │ • Packages │ │ • Ownership │ │ Groups │ + └───────────┘ └─────────────┘ └───────────┘ + Days 02-07 Days 09-11 Day 08 +``` + +### Top 5 Concepts That Connect Everything + +1. **Everything in Linux is a file** — Processes (`/proc`), devices (`/dev`), configs (`/etc`) — understanding files = understanding Linux +2. **Permissions + Ownership = Security** — `chmod` controls WHAT can be done, `chown` controls WHO the rules apply to +3. **`systemctl` is the control center** — Starting, stopping, and monitoring services is the #1 daily DevOps task +4. **Always verify** — `ls -l`, `id`, `groups`, `systemctl status` — never assume, always check +5. **`sudo` responsibly** — With great power comes great responsibility — understand what each command does before running it as root + +--- + +## 📊 Skills Progress Tracker + +| Skill | Day Learned | Confidence Level | Notes | +|-------|-------------|-------------------|-------| +| Navigation & basic commands | Day 02-03 | ⭐⭐⭐⭐⭐ | Comfortable — use daily | +| Process management | Day 04 | ⭐⭐⭐⭐ | Good — need more practice with `kill` signals | +| Service management | Day 05 | ⭐⭐⭐⭐ | Solid — `systemctl` is second nature | +| File system navigation | Day 06 | ⭐⭐⭐⭐⭐ | Know the key directories well | +| Package management | Day 07 | ⭐⭐⭐⭐ | Good with `apt`, need to practice `yum` | +| Cloud deployment | Day 08 | ⭐⭐⭐ | Did it once — need more hands-on practice | +| User & group management | Day 09 | ⭐⭐⭐⭐ | Confident with `useradd`, `usermod -aG` | +| File permissions | Day 10 | ⭐⭐⭐⭐ | Octal notation is clear — practice symbolic more | +| File ownership | Day 11 | ⭐⭐⭐⭐ | `chown` and `chgrp` are straightforward | + +--- + diff --git a/2026/day-13/day-13-lvm.md b/2026/day-13/day-13-lvm.md new file mode 100644 index 0000000000..f8ec341f3c --- /dev/null +++ b/2026/day-13/day-13-lvm.md @@ -0,0 +1,678 @@ +# Day 13 – Linux Volume Management (LVM) + +**Date:** 2026-02-18 +**Author:** Rameez Ahmed +**Challenge:** Learn LVM to manage storage flexibly — create, extend, and mount volumes +**Reference:** [Linux LVM Tutorial](https://youtu.be/Evnf2AAt7FQ?si=ncnfQYySYtK_2K3c) + +--- + +## 📋 Overview + +**Logical Volume Management (LVM)** is a storage management framework in Linux that provides an abstraction layer between physical disks and the file systems that the OS and applications use. Unlike traditional partitioning, LVM allows you to **resize volumes on the fly**, **span multiple disks**, and **create snapshots** — making it the go-to storage solution in production DevOps environments. + +> **🎯 Why LVM matters for DevOps:** +> In production, running out of disk space on a critical volume can cause **downtime, data loss, and failed deployments**. LVM lets you extend storage without unmounting or rebooting — a must-have for zero-downtime operations. + +--- + +## 🏗️ LVM Architecture + +Understanding the **three-layer architecture** of LVM is essential before running any commands: + +``` +┌─────────────────────────────────────────────────────────────┐ +│ 📂 FILE SYSTEMS │ +│ (ext4, xfs, btrfs) │ +│ What applications and users interact with │ +│ │ +│ /mnt/app-data /mnt/db-data /mnt/logs │ +└───────────┬───────────────────┬──────────────────┬──────────┘ + │ │ │ +┌───────────▼───────────────────▼──────────────────▼──────────┐ +│ 🧱 LOGICAL VOLUMES (LVs) │ +│ The "virtual partitions" you use │ +│ │ +│ lv-app-data (500M) lv-db-data (1G) lv-logs (200M) │ +└───────────┬───────────────────┬──────────────────┬──────────┘ + │ │ │ +┌───────────▼───────────────────▼──────────────────▼──────────┐ +│ 📦 VOLUME GROUPS (VGs) │ +│ Pool of storage (combines PVs) │ +│ │ +│ devops-vg (total: 2G from 2 physical disks) │ +└───────────┬───────────────────────────────┬─────────────────┘ + │ │ +┌───────────▼──────────┐ ┌───────────────▼─────────────────┐ +│ 💿 Physical Volume │ │ 💿 Physical Volume │ +│ (PV) - /dev/sdb │ │ (PV) - /dev/sdc │ +│ 1 GB disk │ │ 1 GB disk │ +└───────────┬──────────┘ └───────────────┬─────────────────┘ + │ │ +┌───────────▼──────────┐ ┌───────────────▼─────────────────┐ +│ 🔩 Physical Disk │ │ 🔩 Physical Disk │ +│ /dev/sdb │ │ /dev/sdc │ +│ (HDD/SSD/Virtual) │ │ (HDD/SSD/Virtual) │ +└──────────────────────┘ └─────────────────────────────────┘ +``` + +### The Three Layers Explained + +| Layer | Component | Abbreviation | What It Does | +|-------|-----------|-------------|--------------| +| **Bottom** | Physical Volume | **PV** | A raw disk or partition initialized for LVM use | +| **Middle** | Volume Group | **VG** | A pool that combines one or more PVs into a single storage space | +| **Top** | Logical Volume | **LV** | A "virtual partition" carved from the VG — this is what you format and mount | + +> **💡 Analogy:** Think of it like building with LEGO: +> - **PVs** = Individual LEGO bricks (your physical disks) +> - **VG** = The LEGO baseplate (combines bricks into a usable surface) +> - **LVs** = The structures you build on the baseplate (your usable volumes) + +--- + +## ⚙️ Prerequisites: Setting Up a Virtual Disk + +If you don't have a spare physical disk, create a **virtual disk** for safe practice: + +```bash +# Switch to root user +sudo -i + +# Create a 1GB virtual disk image +dd if=/dev/zero of=/tmp/disk1.img bs=1M count=1024 +``` + +**Output:** +``` +1024+0 records in +1024+0 records out +1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.51234 s, 427 MB/s +``` + +```bash +# Attach the virtual disk as a loop device +losetup -fP /tmp/disk1.img + +# Verify the loop device was created (note the device name) +losetup -a +``` + +**Output:** +``` +/dev/loop0: [64769]:123456 (/tmp/disk1.img) +``` + +> **📌 Note:** Your device might be `/dev/loop0`, `/dev/loop1`, etc., depending on what's already in use. Use the device name shown in the output for all subsequent commands. + +--- + +## 🛠️ Challenge Tasks + +### Task 1: Check Current Storage + +Before making changes, **always audit the current state** of your storage: + +```bash +# View block devices (disks and partitions) +lsblk +``` + +**Expected Output:** +``` +NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS +loop0 7:0 0 1G 0 loop +sda 8:0 0 30G 0 disk +├─sda1 8:1 0 1M 0 part +├─sda2 8:2 0 1.8G 0 part /boot +└─sda3 8:3 0 28.2G 0 part / +``` + +```bash +# Check existing Physical Volumes +pvs +``` + +**Expected Output (fresh system):** +``` + PV VG Fmt Attr PSize PFree + (empty — no PVs configured yet) +``` + +```bash +# Check existing Volume Groups +vgs +``` + +**Expected Output (fresh system):** +``` + VG #PV #LV #SN Attr VSize VFree + (empty — no VGs configured yet) +``` + +```bash +# Check existing Logical Volumes +lvs +``` + +**Expected Output (fresh system):** +``` + LV VG Attr LSize Pool Origin Data% Meta% + (empty — no LVs configured yet) +``` + +```bash +# Check mounted filesystem disk usage +df -h +``` + +**Expected Output:** +``` +Filesystem Size Used Avail Use% Mounted on +/dev/sda3 28G 4G 23G 15% / +tmpfs 2.0G 0 2.0G 0% /dev/shm +/dev/sda2 1.8G 120M 1.6G 7% /boot +``` + +> **🔑 Key Insight:** The `pvs → vgs → lvs` command chain follows the LVM hierarchy from bottom to top. In a production audit, always run all three to get the full picture. + +--- + +### Task 2: Create Physical Volume (PV) + +Initialize the disk for LVM use: + +```bash +# Create a Physical Volume on the loop device +pvcreate /dev/loop0 +``` + +**Expected Output:** +``` + Physical volume "/dev/loop0" successfully created. +``` + +```bash +# Verify the PV was created +pvs +``` + +**Expected Output:** +``` + PV VG Fmt Attr PSize PFree + /dev/loop0 lvm2 a-- 1.00g 1.00g +``` + +```bash +# Detailed PV information +pvdisplay /dev/loop0 +``` + +**Expected Output:** +``` + "/dev/loop0" is a new physical volume of "1.00 GiB" + --- NEW Physical volume --- + PV Name /dev/loop0 + VG Name + PV Size 1.00 GiB + Allocatable NO + PE Size 0 + Total PE 0 + Free PE 0 + Allocated PE 0 + PV UUID xxxx-xxxx-xxxx-xxxx +``` + +> **📌 Note:** `VG Name` is empty because this PV hasn't been assigned to any Volume Group yet. `Allocatable: NO` confirms it's standalone at this point. + +--- + +### Task 3: Create Volume Group (VG) + +Create a storage pool from one or more Physical Volumes: + +```bash +# Create a Volume Group named "devops-vg" using the PV +vgcreate devops-vg /dev/loop0 +``` + +**Expected Output:** +``` + Volume group "devops-vg" successfully created +``` + +```bash +# Verify the VG +vgs +``` + +**Expected Output:** +``` + VG #PV #LV #SN Attr VSize VFree + devops-vg 1 0 0 wz--n- 1020.00m 1020.00m +``` + +```bash +# Detailed VG information +vgdisplay devops-vg +``` + +**Expected Output:** +``` + --- Volume group --- + VG Name devops-vg + System ID + Format lvm2 + VG Access read/write + VG Status resizable + MAX LV 0 + Cur LV 0 + Open LV 0 + Max PV 0 + Cur PV 1 + Act PV 1 + VG Size 1020.00 MiB + PE Size 4.00 MiB + Total PE 255 + Alloc PE / Size 0 / 0 + Free PE / Size 255 / 1020.00 MiB + VG UUID xxxx-xxxx-xxxx-xxxx +``` + +> **💡 Why is VG Size 1020M and not 1024M?** +> LVM reserves a small amount of space for metadata. This is normal — the actual usable space is always slightly less than the physical disk size. + +> **📌 PE (Physical Extents):** LVM divides storage into fixed-size chunks called **Physical Extents** (default 4MB each). 255 PEs × 4MB = 1020MB. This is the smallest unit LVM can allocate. + +--- + +### Task 4: Create Logical Volume (LV) + +Carve out a usable "virtual partition" from the Volume Group: + +```bash +# Create a 500MB Logical Volume named "app-data" inside "devops-vg" +lvcreate -L 500M -n app-data devops-vg +``` + +**Expected Output:** +``` + Logical volume "app-data" created. +``` + +```bash +# Verify the LV +lvs +``` + +**Expected Output:** +``` + LV VG Attr LSize Pool Origin Data% Meta% + app-data devops-vg -wi-a----- 500.00m +``` + +```bash +# Detailed LV information +lvdisplay /dev/devops-vg/app-data +``` + +**Expected Output:** +``` + --- Logical volume --- + LV Path /dev/devops-vg/app-data + LV Name app-data + VG Name devops-vg + LV UUID xxxx-xxxx-xxxx-xxxx + LV Write Access read/write + LV Creation host, time hostname, 2026-02-18 23:45:00 +0500 + LV Status available + # open 0 + LV Size 500.00 MiB + Current LE 125 + Segments 1 + Allocation inherit + Read ahead sectors auto + - currently set to 256 + Block device 253:0 +``` + +> **📊 Storage Accounting:** +> - VG total: **1020M** +> - LV allocated: **500M** +> - VG free: **520M** (available for creating more LVs!) + +--- + +### Task 5: Format and Mount + +A Logical Volume is just a raw block device — you need to create a **filesystem** on it and **mount** it to make it usable: + +```bash +# Step 1: Format the LV with ext4 filesystem +mkfs.ext4 /dev/devops-vg/app-data +``` + +**Expected Output:** +``` +mke2fs 1.46.5 (30-Dec-2021) +Creating filesystem with 128000 4k blocks and 128000 inodes +Filesystem UUID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx +Superblock backups stored on blocks: + 32768, 98304 + +Allocating group tables: done +Writing inode tables: done +Creating journal (4096 blocks): done +Writing superblocks and filesystem accounting information: done +``` + +```bash +# Step 2: Create the mount point directory +mkdir -p /mnt/app-data + +# Step 3: Mount the formatted LV +mount /dev/devops-vg/app-data /mnt/app-data + +# Step 4: Verify it's mounted and usable +df -h /mnt/app-data +``` + +**Expected Output:** +``` +Filesystem Size Used Avail Use% Mounted on +/dev/mapper/devops--vg-app--data 469M 14K 434M 1% /mnt/app-data +``` + +```bash +# Test by writing data to the volume +echo "LVM is working!" > /mnt/app-data/test.txt +cat /mnt/app-data/test.txt +``` + +**Output:** +``` +LVM is working! +``` + +> **⚠️ Persistent Mounting:** The mount above is temporary — it will be lost after a reboot. To make it permanent, add an entry to `/etc/fstab`: +> ```bash +> echo '/dev/devops-vg/app-data /mnt/app-data ext4 defaults 0 2' >> /etc/fstab +> ``` + +--- + +### Task 6: Extend the Volume 🔥 + +This is where LVM truly shines — **extending a live volume without downtime**: + +```bash +# Check current size +df -h /mnt/app-data +``` + +**Output (Before):** +``` +Filesystem Size Used Avail Use% Mounted on +/dev/mapper/devops--vg-app--data 469M 14K 434M 1% /mnt/app-data +``` + +```bash +# Step 1: Extend the Logical Volume by 200MB +lvextend -L +200M /dev/devops-vg/app-data +``` + +**Expected Output:** +``` + Size of logical volume devops-vg/app-data changed from 500.00 MiB (125 extents) to 700.00 MiB (175 extents). + Logical volume devops-vg/app-data successfully resized. +``` + +```bash +# Step 2: Resize the filesystem to use the new space +resize2fs /dev/devops-vg/app-data +``` + +**Expected Output:** +``` +resize2fs 1.46.5 (30-Dec-2021) +Filesystem at /dev/devops-vg/app-data is mounted on /mnt/app-data; on-line resizing required +old_desc_blocks = 2, new_desc_blocks = 3 +The filesystem on /dev/devops-vg/app-data is now 179200 (4k) blocks long. +``` + +```bash +# Step 3: Verify the extended size +df -h /mnt/app-data +``` + +**Output (After):** +``` +Filesystem Size Used Avail Use% Mounted on +/dev/mapper/devops--vg-app--data 662M 14K 612M 1% /mnt/app-data +``` + +> **✅ The volume grew from ~469M to ~662M while still mounted! No downtime, no data loss!** + +> **💡 Pro Tip:** You can combine both steps into one command: +> ```bash +> lvextend -L +200M --resizefs /dev/devops-vg/app-data +> ``` +> The `--resizefs` flag automatically resizes the filesystem after extending the LV. + +--- + +## 📊 LVM Operations Flow + +Here's the complete workflow visualized from start to finish: + +``` + ┌──────────────┐ + │ Physical │ pvcreate /dev/loop0 + │ Disk │─────────────────────────────┐ + │ /dev/loop0 │ │ + └──────────────┘ ▼ + ┌──────────────────┐ + │ Physical Volume │ + │ (PV) │ + │ /dev/loop0 │ + └────────┬─────────┘ + │ + vgcreate devops-vg /dev/loop0 + │ + ▼ + ┌──────────────────┐ + │ Volume Group │ + │ (VG) │ + │ devops-vg │ + │ Total: 1020M │ + └────────┬─────────┘ + │ + lvcreate -L 500M -n app-data devops-vg + │ + ▼ + ┌──────────────────┐ + │ Logical Volume │ + │ (LV) │ + │ app-data: 500M │ + │ Free in VG: 520M │ + └────────┬─────────┘ + │ + mkfs.ext4 /dev/devops-vg/app-data + │ + ▼ + ┌──────────────────┐ + │ Filesystem │ + │ ext4 │ + └────────┬─────────┘ + │ + mount /dev/devops-vg/app-data /mnt/app-data + │ + ▼ + ┌──────────────────┐ + │ 📂 /mnt/app-data │ + │ Usable storage! │ + └──────────────────┘ +``` + +--- + +## 📝 Complete LVM Command Reference + +### Core LVM Commands + +| Layer | Action | Command | Example | +|-------|--------|---------|---------| +| **PV** | Create | `pvcreate` | `pvcreate /dev/sdb` | +| **PV** | List | `pvs` or `pvdisplay` | `pvs` | +| **PV** | Remove | `pvremove` | `pvremove /dev/sdb` | +| **VG** | Create | `vgcreate` | `vgcreate my-vg /dev/sdb` | +| **VG** | List | `vgs` or `vgdisplay` | `vgs` | +| **VG** | Extend | `vgextend` | `vgextend my-vg /dev/sdc` | +| **VG** | Remove | `vgremove` | `vgremove my-vg` | +| **LV** | Create | `lvcreate` | `lvcreate -L 500M -n my-lv my-vg` | +| **LV** | List | `lvs` or `lvdisplay` | `lvs` | +| **LV** | Extend | `lvextend` | `lvextend -L +200M /dev/my-vg/my-lv` | +| **LV** | Reduce | `lvreduce` | `lvreduce -L -100M /dev/my-vg/my-lv` | +| **LV** | Remove | `lvremove` | `lvremove /dev/my-vg/my-lv` | + +### Filesystem Commands + +| Action | Command | Example | +|--------|---------|---------| +| Format with ext4 | `mkfs.ext4` | `mkfs.ext4 /dev/my-vg/my-lv` | +| Format with XFS | `mkfs.xfs` | `mkfs.xfs /dev/my-vg/my-lv` | +| Mount | `mount` | `mount /dev/my-vg/my-lv /mnt/data` | +| Unmount | `umount` | `umount /mnt/data` | +| Resize ext4 | `resize2fs` | `resize2fs /dev/my-vg/my-lv` | +| Resize XFS | `xfs_growfs` | `xfs_growfs /mnt/data` | + +### Virtual Disk Commands (for practice) + +| Action | Command | Example | +|--------|---------|---------| +| Create virtual disk | `dd` | `dd if=/dev/zero of=/tmp/disk.img bs=1M count=1024` | +| Attach as loop device | `losetup` | `losetup -fP /tmp/disk.img` | +| List loop devices | `losetup -a` | `losetup -a` | +| Detach loop device | `losetup -d` | `losetup -d /dev/loop0` | + +--- + +## 🆚 LVM vs Traditional Partitioning + +| Feature | Traditional Partitioning | LVM | +|---------|------------------------|-----| +| Resize volumes | ❌ Very difficult, often requires unmounting | ✅ Extend/shrink on-the-fly | +| Span multiple disks | ❌ One partition = one disk | ✅ VG can span multiple disks | +| Snapshots | ❌ Not supported | ✅ Built-in snapshot support | +| Add new storage | ❌ Create new partition, new mount | ✅ Add PV to VG, extend LV | +| Flexibility | ❌ Fixed once created | ✅ Fully dynamic | +| Complexity | ✅ Simple to set up | ⚠️ Additional layer to manage | +| Performance | ✅ Slightly faster (no abstraction) | ⚠️ Minimal overhead | +| Boot partition | ✅ Standard | ⚠️ Some bootloaders need non-LVM /boot | + +> **💡 Verdict:** For production servers, **always use LVM**. The flexibility to resize and extend without downtime far outweighs the minimal complexity overhead. + +--- + +## 🔄 Common LVM Scenarios in DevOps + +### Scenario 1: Application Running Out of Disk Space + +```bash +# Check which LV is full +df -h + +# Extend it by 5GB (if VG has free space) +lvextend -L +5G --resizefs /dev/app-vg/app-data + +# If VG is also full, add a new disk first +pvcreate /dev/sdc +vgextend app-vg /dev/sdc +lvextend -L +5G --resizefs /dev/app-vg/app-data +``` + +### Scenario 2: Creating a Snapshot Before Deployment + +```bash +# Create a snapshot (safety net before risky changes) +lvcreate -L 1G -s -n app-data-snapshot /dev/app-vg/app-data + +# If deployment fails, restore from snapshot +lvconvert --merge /dev/app-vg/app-data-snapshot +``` + +### Scenario 3: Setting Up Separate Volumes for Logs, Data, and App + +```bash +# Create purpose-specific LVs from one VG +lvcreate -L 10G -n lv-app app-vg +lvcreate -L 20G -n lv-data app-vg +lvcreate -L 5G -n lv-logs app-vg + +# Format and mount each +mkfs.ext4 /dev/app-vg/lv-app +mkfs.ext4 /dev/app-vg/lv-data +mkfs.ext4 /dev/app-vg/lv-logs + +mount /dev/app-vg/lv-app /opt/app +mount /dev/app-vg/lv-data /var/data +mount /dev/app-vg/lv-logs /var/log/app +``` + +--- + +## 🧹 Cleanup (After Practice) + +If using virtual disks for practice, clean up when done: + +```bash +# Step 1: Unmount the filesystem +umount /mnt/app-data + +# Step 2: Remove the Logical Volume +lvremove /dev/devops-vg/app-data + +# Step 3: Remove the Volume Group +vgremove devops-vg + +# Step 4: Remove the Physical Volume +pvremove /dev/loop0 + +# Step 5: Detach the loop device +losetup -d /dev/loop0 + +# Step 6: Delete the virtual disk image +rm /tmp/disk1.img +``` + +> **⚠️ Important:** Always clean up in **reverse order** (LV → VG → PV → disk). Trying to remove a VG before its LVs will fail. + +--- + +## 💡 What I Learned + +### 1. LVM Provides Dynamic Storage That Traditional Partitions Cannot +The ability to **extend a mounted volume** without downtime is game-changing. In production, a `lvextend --resizefs` command at 3 AM can save you from a full outage caused by a disk-full condition — no reboot, no unmounting, no data migration needed. + +### 2. The Three-Layer Architecture is the Key to Understanding LVM +Once you grasp that **PV → VG → LV** mirrors **brick → pool → partition**, every LVM command makes logical sense. Each layer's commands follow the same naming pattern (`pvcreate`/`vgcreate`/`lvcreate`), making the entire system predictable and learnable. + +### 3. Always Resize the Filesystem After Extending the LV +The `lvextend` command only grows the **logical volume** (the block device). The **filesystem** on top of it doesn't automatically grow to fill the new space — you must run `resize2fs` (for ext4) or `xfs_growfs` (for XFS) to expand it. Forgetting this step is a classic mistake that makes it look like `lvextend` "didn't work." Using `--resizefs` with `lvextend` avoids this pitfall entirely. + +--- + +## 🔍 Troubleshooting Guide + +| Issue | Cause | Solution | +|-------|-------|----------| +| `pvcreate` fails with "Device in use" | Disk is already mounted or partitioned | Unmount first: `umount /dev/sdb1` | +| `vgcreate` fails with "PV not found" | PV wasn't created | Run `pvcreate /dev/sdb` first | +| `lvcreate` "Insufficient free space" | VG doesn't have enough room | Check `vgs` for free space; add more PVs with `vgextend` | +| `lvextend` succeeds but `df -h` shows old size | Filesystem not resized | Run `resize2fs /dev/vg/lv` (ext4) or `xfs_growfs /mnt/point` (XFS) | +| Mount lost after reboot | Not in `/etc/fstab` | Add entry: `/dev/vg/lv /mount/point ext4 defaults 0 2` | +| `lvreduce` warns about data loss | Shrinking can destroy data | **Always back up first**, shrink filesystem before LV | +| Loop device not showing up | `losetup` didn't attach | Re-run `losetup -fP /tmp/disk.img` and check `losetup -a` | +| `mkfs` fails on LV | LV path is wrong | Use `/dev/vg-name/lv-name` or `/dev/mapper/vg--name-lv--name` | + +--- diff --git a/2026/day-14/day-14-networking.md b/2026/day-14/day-14-networking.md new file mode 100644 index 0000000000..137c71f908 --- /dev/null +++ b/2026/day-14/day-14-networking.md @@ -0,0 +1,642 @@ +# Day 14 – Networking Fundamentals & Hands-on Checks + +**Date:** 2026-02-18 +**Author:** Rameez Ahmed +**Challenge:** Master core networking concepts and essential troubleshooting commands +**Target Host:** `google.com` (used consistently across all commands) + +--- + +## 📋 Overview + +Networking is the **circulatory system of DevOps**. Every deployment, every API call, every monitoring alert travels over a network. Today's challenge builds the foundation to **diagnose connectivity issues**, understand **how data flows** between systems, and run the **exact commands** you'll use during real incidents at 3 AM. + +> **🎯 Goal:** Be able to answer: *"Is the service reachable? If not, where exactly is it breaking?"* + +--- + +## 🌐 Quick Concepts: OSI vs TCP/IP Models + +### The Two Network Models Side by Side + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ OSI MODEL (7 Layers) TCP/IP MODEL (4 Layers) │ +│ ───────────────────────── ─────────────────────────── │ +│ │ +│ L7 ┌─────────────────────┐ │ +│ │ Application │ │ +│ L6 ├─────────────────────┤ ┌──────────────────────────┐ │ +│ │ Presentation │ │ Application │ │ +│ L5 ├─────────────────────┤ │ (HTTP, DNS, SSH, SMTP) │ │ +│ │ Session │ └────────────┬─────────────┘ │ +│ └──────────┬──────────┘ │ │ +│ │ │ │ +│ L4 ┌──────────▼──────────┐ ┌────────────▼─────────────┐ │ +│ │ Transport │ │ Transport │ │ +│ │ (TCP / UDP) │ │ (TCP / UDP) │ │ +│ └──────────┬──────────┘ └────────────┬─────────────┘ │ +│ │ │ │ +│ L3 ┌──────────▼──────────┐ ┌────────────▼─────────────┐ │ +│ │ Network │ │ Internet │ │ +│ │ (IP, ICMP) │ │ (IP, ICMP, ARP) │ │ +│ └──────────┬──────────┘ └────────────┬─────────────┘ │ +│ │ │ │ +│ L2 ┌──────────▼──────────┐ │ │ +│ │ Data Link │ ┌────────────▼─────────────┐ │ +│ L1 ├─────────────────────┤ │ Network Access │ │ +│ │ Physical │ │ (Ethernet, Wi-Fi, ARP) │ │ +│ └─────────────────────┘ └──────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +### Where Key Protocols Live + +| Protocol | OSI Layer | TCP/IP Layer | What It Does | +|----------|-----------|-------------|--------------| +| **HTTP/HTTPS** | L7 — Application | Application | Web traffic, API calls | +| **DNS** | L7 — Application | Application | Translates domain names to IP addresses | +| **SSH** | L7 — Application | Application | Secure remote shell access | +| **TCP** | L4 — Transport | Transport | Reliable, ordered delivery (connections) | +| **UDP** | L4 — Transport | Transport | Fast, connectionless delivery (DNS, video) | +| **IP** | L3 — Network | Internet | Addressing and routing between networks | +| **ICMP** | L3 — Network | Internet | Ping, traceroute, error messaging | +| **Ethernet** | L2 — Data Link | Network Access | Local network frame delivery | + +### Real-World Example: What Happens When You Run `curl https://google.com` + +``` + You type: curl https://google.com + │ + ▼ + ┌─────────────────────────────────────────────────────────────┐ + │ L7 — APPLICATION │ + │ curl builds an HTTP GET request │ + │ HTTPS = HTTP + TLS encryption │ + └──────────────────────────┬──────────────────────────────────┘ + │ + ┌──────────────────────────▼──────────────────────────────────┐ + │ L7 — DNS RESOLUTION │ + │ "google.com" → DNS query → 142.250.193.206 │ + │ (Asks: What IP address does this domain point to?) │ + └──────────────────────────┬──────────────────────────────────┘ + │ + ┌──────────────────────────▼──────────────────────────────────┐ + │ L4 — TRANSPORT (TCP) │ + │ 3-way handshake: SYN → SYN-ACK → ACK │ + │ Establishes reliable connection to port 443 │ + └──────────────────────────┬──────────────────────────────────┘ + │ + ┌──────────────────────────▼──────────────────────────────────┐ + │ L3 — NETWORK (IP) │ + │ Packet: src=192.168.1.10 → dst=142.250.193.206 │ + │ Routed hop-by-hop across the internet │ + └──────────────────────────┬──────────────────────────────────┘ + │ + ┌──────────────────────────▼──────────────────────────────────┐ + │ L2/L1 — DATA LINK / PHYSICAL │ + │ Ethernet frame → your router → ISP → Google's datacenter │ + │ Electrical signals / light pulses over cables / Wi-Fi │ + └─────────────────────────────────────────────────────────────┘ +``` + +> **💡 Key Takeaway:** Every network request traverses **all layers** — from your application (L7) down to the physical wire (L1), across the network, and back up the stack on the remote server. Understanding this helps you **pinpoint exactly where a failure occurs**. + +--- + +## 🔧 Hands-on Checklist + +### 1. 🏷️ Identity — "Who Am I on the Network?" + +```bash +# View your IP address(es) +hostname -I +``` + +**Expected Output:** +``` +192.168.1.10 +``` + +```bash +# More detailed: view all network interfaces +ip addr show +``` + +**Expected Output:** +``` +1: lo: mtu 65536 + inet 127.0.0.1/8 scope host lo +2: eth0: mtu 1500 + inet 192.168.1.10/24 brd 192.168.1.255 scope global eth0 +``` + +> **📌 Observation:** The machine has two interfaces: +> - `lo` (loopback) — `127.0.0.1` — used for internal communication +> - `eth0` (ethernet) — `192.168.1.10` — the actual network-facing IP + +--- + +### 2. 📡 Reachability — "Can I Reach the Target?" + +```bash +# Ping the target host (4 packets) +ping -c 4 google.com +``` + +**Expected Output:** +``` +PING google.com (142.250.193.206) 56(84) bytes of data. +64 bytes from 142.250.193.206: icmp_seq=1 ttl=117 time=12.3 ms +64 bytes from 142.250.193.206: icmp_seq=2 ttl=117 time=11.8 ms +64 bytes from 142.250.193.206: icmp_seq=3 ttl=117 time=12.1 ms +64 bytes from 142.250.193.206: icmp_seq=4 ttl=117 time=11.9 ms + +--- google.com ping statistics --- +4 packets transmitted, 4 received, 0% packet loss, time 3005ms +rtt min/avg/max/mdev = 11.8/12.025/12.3/0.183 ms +``` + +> **📌 Observation:** +> - **Latency:** ~12ms average — excellent response time (< 50ms is good for internet targets) +> - **Packet loss:** 0% — network path is clean +> - **TTL:** 117 — packet survived 117 hops before expiring (started at 128, so ~11 hops to Google) + +--- + +### 3. 🛤️ Path — "What Route Does My Traffic Take?" + +```bash +# Trace the path to the target +traceroute google.com +``` + +**Expected Output:** +``` +traceroute to google.com (142.250.193.206), 30 hops max, 60 byte packets + 1 gateway (192.168.1.1) 1.234 ms 1.123 ms 1.056 ms + 2 isp-router.example.net 5.678 ms 5.432 ms 5.321 ms + 3 core-router.isp.net 10.234 ms 9.876 ms 10.123 ms + 4 * * * + 5 google-peer.net 11.234 ms 11.123 ms 11.056 ms + 6 142.250.193.206 12.345 ms 12.234 ms 12.123 ms +``` + +> **📌 Observation:** +> - **Hop 1** (1ms) — Local gateway/router — very fast +> - **Hop 4** (`* * *`) — Timeout — some routers block ICMP/traceroute (normal, not a problem) +> - **Hop 6** — Reached Google at ~12ms — consistent with our ping results +> - **No unusually long hops** — network path is healthy + +> **💡 Troubleshooting Tip:** If traceroute shows `* * *` for ALL hops after a certain point, traffic is likely being **blocked by a firewall** at that hop. + +--- + +### 4. 🚪 Ports — "What Services Are Listening?" + +```bash +# List all listening TCP/UDP ports with process names +ss -tulpn +``` + +**Expected Output:** +``` +Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process +tcp LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:(("sshd",pid=1234,fd=3)) +tcp LISTEN 0 511 0.0.0.0:80 0.0.0.0:* users:(("nginx",pid=5678,fd=6)) +tcp LISTEN 0 128 [::]:22 [::]:* users:(("sshd",pid=1234,fd=4)) +udp UNCONN 0 0 127.0.0.53%lo:53 0.0.0.0:* users:(("systemd-resolve",pid=789,fd=13)) +``` + +> **📌 Observation:** +> - **SSH (port 22)** — Listening on all interfaces (`0.0.0.0`) — remote access enabled +> - **Nginx (port 80)** — Web server active — serving HTTP traffic +> - **DNS (port 53)** — Listening on localhost only (`127.0.0.53`) — local resolver +> - **No unexpected ports** — system looks clean + +#### Understanding the `ss` Flags + +| Flag | Meaning | +|------|---------| +| `-t` | Show **TCP** sockets | +| `-u` | Show **UDP** sockets | +| `-l` | Show only **listening** sockets | +| `-p` | Show **process** using the socket | +| `-n` | Show **numeric** ports (don't resolve names) | + +--- + +### 5. 🔍 Name Resolution — "Does DNS Work?" + +```bash +# Resolve a domain name to IP using dig +dig google.com +``` + +**Expected Output (key section):** +``` +;; QUESTION SECTION: +;google.com. IN A + +;; ANSWER SECTION: +google.com. 300 IN A 142.250.193.206 + +;; Query time: 15 msec +;; SERVER: 127.0.0.53#53(127.0.0.53) +``` + +> **📌 Observation:** +> - **Resolved IP:** `142.250.193.206` — DNS is working correctly +> - **TTL:** 300 seconds (5 minutes) — this record is cached for 5 min +> - **Query time:** 15ms — fast DNS resolution +> - **DNS Server:** `127.0.0.53` — using the local systemd-resolved stub + +```bash +# Alternative: nslookup (simpler output) +nslookup google.com +``` + +**Expected Output:** +``` +Server: 127.0.0.53 +Address: 127.0.0.53#53 + +Non-authoritative answer: +Name: google.com +Address: 142.250.193.206 +``` + +> **💡 `dig` vs `nslookup`:** Both resolve DNS, but `dig` provides more detail (TTL, record type, query time, authoritative server). Prefer `dig` for troubleshooting. + +--- + +### 6. 🌍 HTTP Check — "Is the Web Service Responding?" + +```bash +# Fetch HTTP headers only (no body) +curl -I https://google.com +``` + +**Expected Output:** +``` +HTTP/2 301 +location: https://www.google.com/ +content-type: text/html; charset=UTF-8 +date: Tue, 18 Feb 2026 18:50:00 GMT +server: gws +content-length: 220 +``` + +> **📌 Observation:** +> - **Status Code: `301`** — Permanent redirect from `google.com` → `www.google.com` +> - **Protocol:** HTTP/2 — Google uses the latest HTTP version +> - **Server:** `gws` (Google Web Server) + +```bash +# Follow the redirect to get the final response +curl -I -L https://google.com +``` + +**Expected Output (final hop):** +``` +HTTP/2 200 +content-type: text/html; charset=ISO-8859-1 +date: Tue, 18 Feb 2026 18:50:00 GMT +server: gws +``` + +> **✅ Status Code: `200 OK`** — The service is fully operational. + +#### Common HTTP Status Codes for DevOps + +| Code | Meaning | What to Check | +|------|---------|---------------| +| `200` | ✅ OK — Service is healthy | Nothing — all good! | +| `301/302` | ↪️ Redirect | Follow with `curl -L`; check if redirect target is correct | +| `403` | 🚫 Forbidden | Check file permissions, authentication, or IP whitelisting | +| `404` | ❓ Not Found | Check URL path, deployment, or nginx/apache config | +| `500` | 💥 Internal Server Error | Check application logs (`journalctl`, app log files) | +| `502` | 🔌 Bad Gateway | Upstream server is down; check backend service | +| `503` | 🔧 Service Unavailable | Service overloaded or in maintenance mode | +| `504` | ⏱️ Gateway Timeout | Backend is too slow; check performance/resources | + +--- + +### 7. 📊 Connections Snapshot — "What's Connected Right Now?" + +```bash +# View active network connections +netstat -an | head -20 +``` + +**Expected Output:** +``` +Active Internet connections (servers and established) +Proto Recv-Q Send-Q Local Address Foreign Address State +tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN +tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN +tcp 0 0 192.168.1.10:22 192.168.1.5:54321 ESTABLISHED +tcp 0 0 192.168.1.10:80 203.0.113.45:12345 ESTABLISHED +tcp 0 0 192.168.1.10:80 198.51.100.20:23456 TIME_WAIT +``` + +```bash +# Count connections by state +ss -s +``` + +**Expected Output:** +``` +Total: 180 +TCP: 12 (estab 3, closed 2, orphaned 0, timewait 2) +``` + +> **📌 Observation:** +> - **LISTEN:** 2 services (SSH on 22, Nginx on 80) — these are waiting for connections +> - **ESTABLISHED:** 3 active connections — someone is connected to SSH and web +> - **TIME_WAIT:** 2 connections — recently closed, waiting for cleanup (normal) + +#### Connection States Explained + +``` + Client Server + │ │ + │──── SYN ──────────────────────▶│ ← SYN_SENT + │ │ ← SYN_RECEIVED + │◀─── SYN-ACK ─────────────────│ + │ │ + │──── ACK ──────────────────────▶│ ← ESTABLISHED ✅ + │ │ + │◀───── DATA ──────────────────│ (bidirectional) + │──────DATA ───────────────────▶│ + │ │ + │──── FIN ──────────────────────▶│ ← FIN_WAIT_1 + │◀─── ACK ─────────────────────│ ← FIN_WAIT_2 + │◀─── FIN ─────────────────────│ ← CLOSE_WAIT + │──── ACK ──────────────────────▶│ ← TIME_WAIT + │ │ (waits 2×MSL) + │ CLOSED │ ← CLOSED +``` + +--- + +## 🎯 Mini Task: Port Probe & Interpret + +### Step 1: Identify a Listening Port + +```bash +ss -tulpn | grep LISTEN +``` + +**Output:** +``` +tcp LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:(("sshd",pid=1234,fd=3)) +``` + +> **Selected:** SSH service on port **22** + +--- + +### Step 2: Test the Port + +```bash +# Probe port 22 using netcat +nc -zv localhost 22 +``` + +**Expected Output:** +``` +Connection to localhost (127.0.0.1) 22 port [tcp/ssh] succeeded! +``` + +```bash +# Alternative: test using curl (for HTTP services) +curl -I http://localhost:80 +``` + +**Expected Output:** +``` +HTTP/1.1 200 OK +Server: nginx/1.24.0 +``` + +--- + +### Step 3: Interpretation + +> **✅ Port 22 (SSH) is reachable from localhost.** The `sshd` service is running and accepting connections. If it were NOT reachable, the next checks would be: +> 1. **Service status:** `systemctl status sshd` — is the service running? +> 2. **Firewall rules:** `iptables -L -n` or `ufw status` — is the port blocked? +> 3. **Bind address:** `ss -tlnp | grep 22` — is it listening on the right interface? + +--- + +## 🧩 Troubleshooting Decision Tree + +When something is "not working" on the network, follow this **layered approach** (bottom-up): + +``` + 🔴 "It's not working!" + │ + ▼ + ┌───────────────────────────────┐ + │ Can you PING the target? │ + │ ping │ + └───────────┬───────────┬────────┘ + YES │ │ NO + ▼ ▼ + ┌──────────────┐ ┌──────────────────────┐ + │ DNS works? │ │ Check: │ + │ dig │ │ • ip addr (have IP?) │ + │ │ │ • ip route (gateway?) │ + │ │ │ • Physical cable/WiFi │ + └──────┬───┬───┘ └──────────────────────┘ + YES │ │ NO + ▼ ▼ + ┌──────────────┐ ┌─────────────────────────┐ + │ Port open? │ │ Check: │ + │ nc -zv │ │ • /etc/resolv.conf │ + │ │ │ • dig @8.8.8.8 │ + │ │ │ • systemd-resolved │ + └──────┬───┬───┘ └─────────────────────────┘ + YES │ │ NO + ▼ ▼ + ┌──────────────┐ ┌──────────────────────────┐ + │ HTTP status? │ │ Check: │ + │ curl -I │ │ • systemctl status │ + │ │ │ • iptables -L -n │ + │ │ │ • ufw status │ + └──────┬───┬───┘ └──────────────────────────┘ + 200 │ │ 5xx + ▼ ▼ + ┌─────────┐ ┌────────────────────────────┐ + │ ✅ GOOD │ │ Check application logs: │ + │ Service │ │ • journalctl -u │ + │ healthy │ │ • tail -f /var/log/app.log │ + │ │ │ • docker logs │ + └─────────┘ └────────────────────────────┘ +``` + +--- + +## 🤔 Reflections + +### Which command gives you the fastest signal when something is broken? + +> **`ping`** — In under 2 seconds you know if the target is reachable. Zero latency to run, instant result. If ping fails, you immediately know it's a **network/infrastructure issue** (L3 or below) rather than an application issue. It's the "heartbeat check" of troubleshooting. +> +> **Runner-up:** `curl -I ` — Takes 1-2 seconds and tells you if the **application layer** (L7) is working. If ping works but curl fails, you've narrowed the problem to L4-L7. + +--- + +### What layer would you inspect next if DNS fails? + +> **If DNS fails → Inspect L7 (Application) and L3 (Network):** +> 1. Check if the DNS **server itself** is reachable: `ping 8.8.8.8` (Google DNS) +> - If ping works → DNS server/config issue (L7) → Check `/etc/resolv.conf` +> - If ping fails → Network issue (L3) → Check routing with `ip route` +> 2. Try an alternative DNS server: `dig @8.8.8.8 google.com` +> - If this works → Your configured DNS server is down, not the network + +--- + +### What layer would you inspect if HTTP 500 shows up? + +> **If HTTP 500 → Inspect L7 (Application) exclusively:** +> - The network is fine (request reached the server and got a response) +> - The problem is **inside the application** code or its dependencies +> - **Check:** Application logs (`journalctl -u app`), database connectivity, disk space, memory + +--- + +### Two follow-up checks in a real incident: + +| # | Check | Command | Why | +|---|-------|---------|-----| +| 1 | **Resource pressure** | `top` / `df -h` / `free -h` | A server can be "reachable" but failing due to CPU/memory/disk exhaustion | +| 2 | **Recent changes** | `journalctl --since "1 hour ago"` + `git log -5` | Most incidents correlate with a recent deployment or config change | + +--- + +## 📝 Complete Networking Command Reference + +### Connectivity & Diagnostics + +| Command | Purpose | Layer | +|---------|---------|-------| +| `hostname -I` | Show local IP addresses | Identity | +| `ip addr show` | Detailed interface info (IPs, MACs, state) | L2/L3 | +| `ip route` | Show routing table (where traffic goes) | L3 | +| `ping -c 4 ` | Test basic reachability (ICMP) | L3 | +| `traceroute ` | Trace the path packets take | L3 | +| `mtr ` | Combined ping + traceroute (live) | L3 | + +### DNS + +| Command | Purpose | Layer | +|---------|---------|-------| +| `dig ` | Detailed DNS lookup | L7 | +| `dig +short ` | Quick IP-only DNS lookup | L7 | +| `dig @8.8.8.8 ` | Query a specific DNS server | L7 | +| `nslookup ` | Simple DNS lookup | L7 | +| `cat /etc/resolv.conf` | Check configured DNS servers | Config | + +### Ports & Connections + +| Command | Purpose | Layer | +|---------|---------|-------| +| `ss -tulpn` | List listening ports with processes | L4 | +| `ss -s` | Connection state summary | L4 | +| `netstat -an` | All active connections | L4 | +| `nc -zv ` | Test if a specific port is open | L4 | +| `lsof -i :` | Which process is using a port | L4 | + +### HTTP & Application + +| Command | Purpose | Layer | +|---------|---------|-------| +| `curl -I ` | Fetch HTTP headers (status code) | L7 | +| `curl -I -L ` | Follow redirects | L7 | +| `curl -v ` | Verbose output (TLS, headers, body) | L7 | +| `curl -o /dev/null -s -w "%{http_code}" ` | Get just the status code | L7 | +| `wget --spider ` | Check if URL is accessible | L7 | + +### Firewall + +| Command | Purpose | Layer | +|---------|---------|-------| +| `iptables -L -n` | List firewall rules | L3/L4 | +| `ufw status` | UFW firewall status (Ubuntu) | L3/L4 | +| `firewall-cmd --list-all` | firewalld status (RHEL/CentOS) | L3/L4 | + +--- + +## 🏗️ Real-World DevOps Networking Scenarios + +### Scenario 1: "Website is Down!" + +```bash +# Step 1: Can you reach the server at all? +ping -c 3 myapp.example.com + +# Step 2: Is DNS resolving correctly? +dig myapp.example.com + +# Step 3: Is the web server listening? +nc -zv myapp.example.com 443 + +# Step 4: What does the HTTP response say? +curl -I https://myapp.example.com + +# Step 5: Check the service on the server +ssh admin@myapp.example.com "systemctl status nginx" +``` + +### Scenario 2: "App Works Locally but Not From Outside" + +```bash +# On the server: confirm it's listening +ss -tulpn | grep 8080 + +# Check if it's binding to 0.0.0.0 (all interfaces) vs 127.0.0.1 (localhost only) +# 127.0.0.1:8080 → Only accessible locally! +# 0.0.0.0:8080 → Accessible from outside ✅ + +# Check firewall +sudo iptables -L -n | grep 8080 +sudo ufw status | grep 8080 +``` + +### Scenario 3: "DNS is Intermittently Failing" + +```bash +# Test with your configured DNS +dig example.com + +# Test with Google DNS (bypass local DNS) +dig @8.8.8.8 example.com + +# Test with Cloudflare DNS +dig @1.1.1.1 example.com + +# If external DNS works but local doesn't → local DNS issue +cat /etc/resolv.conf +systemctl status systemd-resolved +``` + +--- + +## 💡 What I Learned + +### 1. Troubleshooting Is a Layered Process — Always Start from the Bottom +The OSI/TCP-IP model isn't just academic theory — it's a **troubleshooting framework**. Start from L1 (is the cable plugged in?) and work your way up. If `ping` works but `curl` fails, you've immediately eliminated L1-L3 and can focus on L4-L7. This systematic approach prevents wasting time on the wrong layer. + +### 2. `ss -tulpn` Is the Most Underrated DevOps Command +In production debugging, knowing what's **listening** on what port is half the battle. A service can be "running" (via `systemctl status`) but NOT listening on the expected port (crashed worker, wrong config). `ss -tulpn` bridges that gap — it tells you what's actually ready to accept connections. + +### 3. DNS Failures Masquerade as "Network Down" +When DNS fails, everything that uses domain names breaks — `curl`, `apt update`, application APIs, etc. But the network itself is fine! Running `ping 8.8.8.8` (by IP, not domain) instantly proves the network works and isolates DNS as the culprit. Always test both IP and domain names when diagnosing connectivity. + +--- + diff --git a/2026/day-15/day-15-networking-concepts.md b/2026/day-15/day-15-networking-concepts.md new file mode 100644 index 0000000000..6b45b64814 --- /dev/null +++ b/2026/day-15/day-15-networking-concepts.md @@ -0,0 +1,517 @@ +# Day 15 – Networking Concepts: DNS, IP, Subnets & Ports + +**Date:** 2026-02-19 +**Author:** Rameez Ahmed +**Challenge:** Understand the core building blocks of networking — DNS, IP addressing, CIDR/subnetting, and ports + +--- + +## 📋 Overview + +Building on the hands-on networking commands from Day 14, today dives deeper into the **concepts** behind those commands. Every `curl`, every `ping`, every deployment depends on DNS resolution, IP routing, subnets, and port mapping. By the end of today, the question *"Why can't my app connect?"* should be answerable systematically. + +--- + +## 🌍 Task 1: DNS — How Names Become IPs + +### What Happens When You Type `google.com` in a Browser? + +When you type `google.com`, your browser doesn't know where to send the request — it only understands IP addresses. So a **DNS resolution chain** kicks off: your browser checks its local cache, then asks the OS resolver, which queries your configured DNS server (e.g., `8.8.8.8`). That server either has the answer cached or recursively queries the **root servers** → `.com` **TLD servers** → **Google's authoritative nameservers**. Within milliseconds, the domain name is translated to an IP like `142.250.193.206`, and the browser establishes a TCP connection to that IP on port 443 (HTTPS). + +### The DNS Resolution Journey (Visual) + +``` + You type: google.com + │ + ▼ + ┌──────────────────────┐ + │ 1️⃣ Browser Cache │ ── Hit? → Use cached IP ✅ + │ (checked first) │ + └──────────┬───────────┘ + Miss │ + ▼ + ┌──────────────────────┐ + │ 2️⃣ OS DNS Cache │ ── Hit? → Use cached IP ✅ + │ (/etc/hosts, stub) │ + └──────────┬───────────┘ + Miss │ + ▼ + ┌──────────────────────┐ + │ 3️⃣ Recursive DNS │ ── Hit in cache? → Return IP ✅ + │ Resolver (ISP or │ + │ 8.8.8.8 / 1.1.1.1) │ + └──────────┬───────────┘ + Miss │ + ▼ + ┌──────────────────────┐ + │ 4️⃣ Root DNS Server │ ── "I don't know google.com, + │ (13 globally) │ but ask the .com TLD server" + └──────────┬───────────┘ + │ Referral + ▼ + ┌──────────────────────┐ + │ 5️⃣ TLD Server │ ── "Ask Google's nameserver: + │ (.com zone) │ ns1.google.com" + └──────────┬───────────┘ + │ Referral + ▼ + ┌──────────────────────────────┐ + │ 6️⃣ Authoritative Server │ ── "google.com = 142.250.193.206" + │ (ns1.google.com) │ ✅ ANSWER! + └──────────┬───────────────────┘ + │ + ▼ + ┌──────────────────────────────┐ + │ Result cached at each level │ + │ Browser → OS → Resolver │ + │ TTL: 300s (5 min) │ + └──────────────────────────────┘ +``` + +### DNS Record Types + +| Record Type | Full Name | What It Does | Example | +|-------------|-----------|-------------|---------| +| **A** | Address | Maps a domain to an **IPv4** address | `google.com → 142.250.193.206` | +| **AAAA** | IPv6 Address | Maps a domain to an **IPv6** address | `google.com → 2607:f8b0:4004:800::200e` | +| **CNAME** | Canonical Name | Creates an **alias** pointing to another domain (not an IP) | `www.example.com → example.com` | +| **MX** | Mail Exchange | Specifies the **mail server** for the domain (for receiving email) | `example.com → mail.example.com (priority 10)` | +| **NS** | Name Server | Specifies which DNS servers are **authoritative** for the domain | `google.com → ns1.google.com` | + +> **💡 Bonus Records DevOps Engineers Should Know:** + +| Record Type | What It Does | DevOps Use Case | +|-------------|-------------|-----------------| +| **TXT** | Stores arbitrary text data | SPF/DKIM email auth, domain verification, SSL validation | +| **SRV** | Service locator (host + port) | Kubernetes service discovery, SIP, LDAP | +| **PTR** | Reverse DNS (IP → domain) | Email server reputation, security audits | +| **SOA** | Start of Authority | Zone metadata — serial number, refresh interval | + +### Hands-On: `dig` Output Analysis + +```bash +dig google.com +``` + +**Output (key sections annotated):** + +``` +;; QUESTION SECTION: +;google.com. IN A ← "Give me the A record for google.com" + +;; ANSWER SECTION: +google.com. 300 IN A 142.250.193.206 +│ │ │ │ +│ │ │ └── The IP address (A record value) +│ │ └── Record type: A (IPv4) +│ └── TTL: 300 seconds (cached for 5 minutes) +└── Domain queried + +;; Query time: 15 msec ← How long the DNS lookup took +;; SERVER: 127.0.0.53#53 ← Which DNS server answered +;; WHEN: Tue Feb 19 00:00:00 PKT 2026 +;; MSG SIZE rcvd: 55 ← Response size in bytes +``` + +> **📌 Key Findings:** +> - **A Record:** `142.250.193.206` — this is Google's IPv4 address +> - **TTL:** `300` — result is cached for 5 minutes; after that, a fresh query is needed +> - **Query Time:** 15ms — fast, likely answered from a nearby cache + +--- + +## 🔢 Task 2: IP Addressing + +### What Is an IPv4 Address? + +An IPv4 address is a **32-bit number** that uniquely identifies a device on a network. It's written as four **octets** (8-bit numbers) separated by dots, each ranging from 0 to 255. + +``` + 192 . 168 . 1 . 10 + │ │ │ │ + │ │ │ └── Host identifier (this specific device) + │ │ └── ─────── Network portion (depends on subnet mask) + │ └── ──────────────── + └── ───────────────────────── + + Binary: 11000000.10101000.00000001.00001010 + + Total possible addresses: 2³² = 4,294,967,296 (~4.3 billion) +``` + +### Public vs Private IPs + +``` +┌──────────────────────────────────────────────────────────────────┐ +│ THE INTERNET │ +│ │ +│ Public IPs: Globally unique, routable on the internet │ +│ Example: 54.239.28.85 (AWS), 142.250.193.206 (Google) │ +│ │ +│ ┌─────────────────┐ ┌─────────────────┐ │ +│ │ Your Server │ │ Google Server │ │ +│ │ 54.239.28.85 │◄─────►│ 142.250.193.206 │ │ +│ └────────┬────────┘ └─────────────────┘ │ +│ │ │ +│ │ NAT (Network Address Translation) │ +│ │ │ +│ ┌────────▼──────────────────────────────────┐ │ +│ │ PRIVATE NETWORK (Your Office/Home) │ │ +│ │ │ │ +│ │ Private IPs: Only valid inside the LAN │ │ +│ │ NOT routable on the internet │ │ +│ │ │ │ +│ │ 🖥️ 192.168.1.10 (your laptop) │ │ +│ │ 🖥️ 192.168.1.11 (colleague's laptop) │ │ +│ │ 🖨️ 192.168.1.50 (office printer) │ │ +│ │ 📱 192.168.1.105 (your phone) │ │ +│ └────────────────────────────────────────────┘ │ +└──────────────────────────────────────────────────────────────────┘ +``` + +| Feature | Public IP | Private IP | +|---------|-----------|------------| +| **Scope** | Globally unique across the internet | Only unique within the local network | +| **Routable?** | ✅ Yes — reachable from anywhere | ❌ No — only within the LAN | +| **Assigned by** | ISP or cloud provider | Router (DHCP) or manual config | +| **Example** | `54.239.28.85` | `192.168.1.10` | +| **Cost** | Paid (limited supply) | Free (unlimited within your network) | +| **Use case** | Web servers, APIs, public services | Internal apps, databases, printers | + +### Private IP Ranges (RFC 1918) + +| Class | Range | CIDR | Total Addresses | Common Use | +|-------|-------|------|-----------------|------------| +| **Class A** | `10.0.0.0` – `10.255.255.255` | `10.0.0.0/8` | 16,777,216 | Large enterprises, cloud VPCs (AWS, GCP) | +| **Class B** | `172.16.0.0` – `172.31.255.255` | `172.16.0.0/12` | 1,048,576 | Medium networks, Docker default | +| **Class C** | `192.168.0.0` – `192.168.255.255` | `192.168.0.0/16` | 65,536 | Home networks, small offices | + +> **🔒 Special Address:** `127.0.0.1` (localhost / loopback) — always refers to "this machine." Not a private IP — it's in a separate reserved range (`127.0.0.0/8`). + +### Hands-On: Identifying Your Private IPs + +```bash +ip addr show +``` + +**Expected Output (key section):** +``` +2: eth0: mtu 1500 + inet 192.168.1.10/24 brd 192.168.1.255 scope global eth0 + └────────────┘ + Private IP! (192.168.x.x range = Class C private) +``` + +> **📌 Analysis:** `192.168.1.10` falls within the `192.168.0.0/16` private range → This is a **private IP**, not directly accessible from the internet. + +--- + +## 🧮 Task 3: CIDR & Subnetting + +### What Does `/24` Mean? + +CIDR (Classless Inter-Domain Routing) notation like `/24` tells you how many bits of the 32-bit IP address are used for the **network portion**. The remaining bits identify individual **hosts**. + +``` + IP Address: 192.168.1.0 /24 + + Binary: 11000000.10101000.00000001 . 00000000 + ├──── Network (24 bits) ────┤├ Hosts ┤ + (8 bits) + + Subnet Mask: 255.255.255.0 + + Network ID: 192.168.1.0 (first address — identifies the network) + Broadcast: 192.168.1.255 (last address — reaches all hosts) + Usable Range: 192.168.1.1 – 192.168.1.254 + Usable Hosts: 2⁸ - 2 = 254 + └── minus network ID and broadcast +``` + +### Why Do We Subnet? + +Subnetting is like **dividing a large office floor into separate rooms**. Without it, every device would be in one giant network, creating: + +1. **🔒 Security Risk** — A compromised device could reach everything. Subnets create boundaries (e.g., separate web servers from databases) +2. **📡 Broadcast Storms** — Every broadcast reaches ALL devices. Subnets limit the blast radius +3. **📊 Efficient IP Usage** — A company with 50 devices doesn't need 65,536 IPs (`/16`). A `/26` (62 hosts) is more appropriate +4. **🏗️ Logical Organization** — `10.0.1.0/24` for production, `10.0.2.0/24` for staging, `10.0.3.0/24` for databases + +### CIDR Table (Filled) + +| CIDR | Subnet Mask | Network Bits | Host Bits | Total IPs | Usable Hosts | Common Use | +|------|-------------|-------------|-----------|-----------|-------------|------------| +| `/8` | `255.0.0.0` | 8 | 24 | 16,777,216 | 16,777,214 | Massive corporate networks | +| `/16` | `255.255.0.0` | 16 | 16 | 65,536 | 65,534 | Cloud VPCs, large campuses | +| `/20` | `255.255.240.0` | 20 | 12 | 4,096 | 4,094 | AWS default VPC subnets | +| `/24` | `255.255.255.0` | 24 | 8 | 256 | 254 | Most common — small to mid networks | +| `/26` | `255.255.255.192` | 26 | 6 | 64 | 62 | Small teams, database subnets | +| `/28` | `255.255.255.240` | 28 | 4 | 16 | 14 | Very small subnets, point-to-point | +| `/30` | `255.255.255.252` | 30 | 2 | 4 | 2 | Router-to-router links | +| `/32` | `255.255.255.255` | 32 | 0 | 1 | 1 | Single host (loopback, specific route) | + +> **🧮 Formula:** `Usable Hosts = 2^(32 - CIDR) - 2` +> The `-2` accounts for the **network address** (first) and **broadcast address** (last), which can't be assigned to hosts. + +### Visual: How Subnetting Splits a Network + +``` + Original: 192.168.1.0/24 (254 usable hosts) + + Split into 4 subnets (/26 each): + + ┌─────────────────────────────────────────────────────┐ + │ 192.168.1.0/26 │ Hosts: .1 – .62 (62 hosts)│ + │ 🌐 Web Servers │ Gateway: .1 │ + ├─────────────────────────────────────────────────────┤ + │ 192.168.1.64/26 │ Hosts: .65 – .126 (62 hosts)│ + │ 🗄️ Database Servers │ Gateway: .65 │ + ├─────────────────────────────────────────────────────┤ + │ 192.168.1.128/26 │ Hosts: .129 – .190 (62 hosts│ + │ 🔧 DevOps/CI-CD │ Gateway: .129 │ + ├─────────────────────────────────────────────────────┤ + │ 192.168.1.192/26 │ Hosts: .193 – .254 (62 hosts│ + │ 👥 Office Devices │ Gateway: .193 │ + └─────────────────────────────────────────────────────┘ +``` + +--- + +## 🚪 Task 4: Ports — The Doors to Services + +### What Is a Port? + +A port is a **logical endpoint** (numbered 0–65535) that identifies a specific service on a machine. While an IP address gets traffic to the right **machine**, the port gets it to the right **application** on that machine. + +``` + Analogy: An IP address is like a building's street address. + A port is like the apartment number inside the building. + + ┌──────────────────────────────────────────────┐ + │ Server: 192.168.1.10 │ + │ ┌──────────────────────────────────────────┐│ + │ │ 🔑 Port 22 — SSH (remote access) ││ + │ ├──────────────────────────────────────────┤│ + │ │ 🌐 Port 80 — HTTP (web traffic) ││ + │ ├──────────────────────────────────────────┤│ + │ │ 🔒 Port 443 — HTTPS (secure web) ││ + │ ├──────────────────────────────────────────┤│ + │ │ 🗄️ Port 3306 — MySQL (database) ││ + │ ├──────────────────────────────────────────┤│ + │ │ 📊 Port 9090 — Prometheus (monitoring) ││ + │ └──────────────────────────────────────────┘│ + │ 65,536 possible ports — each a separate door│ + └──────────────────────────────────────────────┘ +``` + +### Port Ranges + +| Range | Name | Description | +|-------|------|-------------| +| `0 – 1023` | **Well-Known Ports** | Reserved for standard services (HTTP, SSH, DNS). Require root/sudo to bind. | +| `1024 – 49151` | **Registered Ports** | Used by applications (MySQL, Redis, custom apps). No root needed. | +| `49152 – 65535` | **Dynamic/Ephemeral** | Temporarily assigned for client-side connections. Auto-assigned by OS. | + +### Common Ports Every DevOps Engineer Must Know + +| Port | Service | Protocol | Description | DevOps Context | +|------|---------|----------|-------------|----------------| +| **22** | SSH | TCP | Secure Shell — remote command-line access | Server management, `scp`, `sftp`, Git over SSH | +| **80** | HTTP | TCP | Unencrypted web traffic | Web servers (Nginx, Apache), health checks | +| **443** | HTTPS | TCP | Encrypted web traffic (HTTP + TLS) | Production web apps, APIs, certificates | +| **53** | DNS | UDP/TCP | Domain name resolution | `dig`, `nslookup`, internal DNS servers | +| **3306** | MySQL | TCP | MySQL database connections | Application → database connectivity | +| **5432** | PostgreSQL | TCP | PostgreSQL database connections | Modern app stacks, cloud databases | +| **6379** | Redis | TCP | In-memory cache/data store | Session storage, caching, pub/sub | +| **27017** | MongoDB | TCP | NoSQL document database | MERN/MEAN stack applications | +| **8080** | HTTP Alt | TCP | Alternative HTTP port | Development servers, Tomcat, Jenkins | +| **9090** | Prometheus | TCP | Monitoring metrics endpoint | Infrastructure monitoring | +| **2379** | etcd | TCP | Key-value store for Kubernetes | K8s cluster state storage | +| **6443** | K8s API | TCP | Kubernetes API server | `kubectl` commands, cluster management | +| **3000** | Grafana | TCP | Visualization dashboards | Monitoring and alerting | +| **5601** | Kibana | TCP | Elasticsearch dashboards | Log analysis (ELK stack) | + +### Hands-On: Matching Listening Ports to Services + +```bash +ss -tulpn +``` + +**Expected Output:** +``` +Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process +tcp LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:(("sshd",pid=1234,fd=3)) +tcp LISTEN 0 511 0.0.0.0:80 0.0.0.0:* users:(("nginx",pid=5678,fd=6)) +udp UNCONN 0 0 127.0.0.53%lo:53 0.0.0.0:* users:(("systemd-resolve",pid=789,fd=13)) +``` + +> **📌 Port-to-Service Matching:** +> +> | Port | Process | Service | Match ✅ | +> |------|---------|---------|---------| +> | `22` | `sshd` | SSH — Secure Shell | ✅ Well-known port 22 = SSH | +> | `80` | `nginx` | HTTP — Web Server | ✅ Well-known port 80 = HTTP | +> | `53` | `systemd-resolve` | DNS — Name Resolution | ✅ Well-known port 53 = DNS | + +--- + +## 🧩 Task 5: Putting It All Together + +### Scenario 1: `curl http://myapp.com:8080` — What's Involved? + +``` + curl http://myapp.com:8080 + │ │ │ + │ │ └── PORT: 8080 (Task 4 — Transport layer port) + │ └──────────── DNS: "myapp.com" resolved to an IP (Task 1) + └────────────────────── HTTP: Application-layer protocol (Day 14) + + Full flow: + 1. DNS Resolution (Task 1): myapp.com → 10.0.1.20 (could be A record) + 2. IP Routing (Task 2): Traffic routes to 10.0.1.20 (private IP in VPC) + 3. Subnet Check (Task 3): Are client & server in the same /24? If not, goes via gateway + 4. Port Connection (Task 4): TCP connection to port 8080 on the target + 5. HTTP Request: GET / HTTP/1.1 sent over the established connection +``` + +> **Every concept from today is involved in a single `curl` command!** DNS resolves the name, IP addressing routes the packet, subnets determine the path, and ports deliver it to the right application. + +--- + +### Scenario 2: App Can't Reach Database at `10.0.1.50:3306` — Troubleshooting + +``` + App ──✘──► 10.0.1.50:3306 (MySQL) + + Troubleshooting checklist (in order): +``` + +| # | Check | Command | What You're Testing | +|---|-------|---------|---------------------| +| 1 | **Is the IP reachable?** | `ping 10.0.1.50` | Network/routing (L3) | +| 2 | **Is the port open?** | `nc -zv 10.0.1.50 3306` | MySQL is listening (L4) | +| 3 | **Is MySQL actually running?** | `systemctl status mysql` (on DB server) | Service status | +| 4 | **Is MySQL binding to the right interface?** | `ss -tlnp \| grep 3306` | `127.0.0.1` = local only! Need `0.0.0.0` | +| 5 | **Is there a firewall blocking?** | `iptables -L -n \| grep 3306` | Security group / iptables rule | +| 6 | **Are they in the same subnet?** | Compare CIDRs | If different subnets, check routing between them | +| 7 | **Is MySQL allowing the connection?** | Check `mysql.user` table | MySQL's own auth (host whitelist) | + +> **🔑 Most common cause:** MySQL is bound to `127.0.0.1` (localhost only) in `/etc/mysql/my.cnf` instead of `0.0.0.0`. Change `bind-address = 0.0.0.0` and restart. + +--- + +## 🔗 How All Networking Concepts Connect + +``` + ┌──────────────────────────────────────────────────────────────────┐ + │ A SINGLE API REQUEST │ + │ │ + │ curl https://api.myapp.com/users │ + │ │ │ │ │ + │ │ │ └── Path (Application layer) │ + │ │ └── Domain → DNS resolves to IP │ + │ └── HTTPS = Port 443 │ + │ │ + │ ┌─────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐ │ + │ │ DNS │────►│ IP │────►│ Subnet │────►│ Port │ │ + │ │ │ │ Routing │ │ Routing │ │ Service │ │ + │ │ Name → │ │ Src/Dst │ │ Same net?│ │ Which │ │ + │ │ IP addr │ │ address │ │ Gateway? │ │ app? │ │ + │ └─────────┘ └──────────┘ └──────────┘ └─────────┘ │ + │ Task 1 Task 2 Task 3 Task 4 │ + └──────────────────────────────────────────────────────────────────┘ +``` + +--- + +## 📝 Complete Reference Cheat Sheet + +### DNS Commands + +| Command | Purpose | +|---------|---------| +| `dig ` | Full DNS lookup with details | +| `dig +short ` | Quick — just the IP | +| `dig MX` | Query MX (mail) records | +| `dig NS` | Query nameservers | +| `dig CNAME` | Query aliases | +| `dig @8.8.8.8 ` | Use specific DNS server | +| `nslookup ` | Simple DNS lookup | +| `host ` | Simplest DNS lookup | + +### IP & Network Commands + +| Command | Purpose | +|---------|---------| +| `ip addr show` | View all interfaces and IPs | +| `ip route show` | View routing table | +| `ip route get ` | Show how traffic reaches a specific IP | +| `hostname -I` | Show just the IP(s) | + +### Subnet Calculator (Mental Math) + +``` + /24 → 256 IPs → 254 usable (most common) + /25 → 128 IPs → 126 usable (half of /24) + /26 → 64 IPs → 62 usable (quarter) + /27 → 32 IPs → 30 usable + /28 → 16 IPs → 14 usable + /29 → 8 IPs → 6 usable + /30 → 4 IPs → 2 usable (point-to-point) + /32 → 1 IP → 1 host (single host) + + Each step up halves the hosts. + Each step down doubles the hosts. +``` + +--- + +## 🏗️ Real-World DevOps Subnet Design (AWS VPC Example) + +``` + VPC: 10.0.0.0/16 (65,534 usable hosts) + + ┌──────────────────────────────────────────────────┐ + │ Public Subnets (internet-facing) │ + │ ┌─────────────────┐ ┌─────────────────┐ │ + │ │ 10.0.1.0/24 │ │ 10.0.2.0/24 │ │ + │ │ AZ: us-east-1a │ │ AZ: us-east-1b │ │ + │ │ ALB, NAT GW │ │ ALB, NAT GW │ │ + │ │ 254 hosts │ │ 254 hosts │ │ + │ └─────────────────┘ └─────────────────┘ │ + │ │ + │ Private Subnets (no direct internet access) │ + │ ┌─────────────────┐ ┌─────────────────┐ │ + │ │ 10.0.10.0/24 │ │ 10.0.11.0/24 │ │ + │ │ AZ: us-east-1a │ │ AZ: us-east-1b │ │ + │ │ App Servers │ │ App Servers │ │ + │ │ 254 hosts │ │ 254 hosts │ │ + │ └─────────────────┘ └─────────────────┘ │ + │ │ + │ Database Subnets (most restricted) │ + │ ┌─────────────────┐ ┌─────────────────┐ │ + │ │ 10.0.20.0/24 │ │ 10.0.21.0/24 │ │ + │ │ AZ: us-east-1a │ │ AZ: us-east-1b │ │ + │ │ RDS, ElastiCache │ │ RDS, ElastiCache │ │ + │ │ 254 hosts │ │ 254 hosts │ │ + │ └─────────────────┘ └─────────────────┘ │ + └──────────────────────────────────────────────────┘ +``` + +> **💡 This is exactly how production AWS architectures use subnetting!** +> Public subnets get internet access, private subnets host apps, and database subnets are the most isolated — all within one VPC. + +--- + +## 💡 What I Learned + +### 1. DNS Is the Internet's Phone Book — And Its Failure Mode Is Deceptive +DNS translates human-friendly names to IPs, and it operates as a hierarchical caching system. When it fails, **everything breaks** — but the network itself is fine. The key diagnostic trick: `ping 8.8.8.8` works but `ping google.com` doesn't? **It's DNS, not the network.** This single check saves hours of misdiagnosis. + +### 2. CIDR Notation Is a DevOps Daily Tool, Not Just Theory +Every AWS VPC, every Kubernetes network policy, every security group rule uses CIDR notation. Understanding that `/24` = 254 hosts and `/32` = single host is not academic — it's the difference between "allowing traffic from one server" (`10.0.1.50/32`) and "allowing traffic from the entire subnet" (`10.0.1.0/24`). Getting this wrong can create security vulnerabilities or block legitimate traffic. + +### 3. Ports Are the Missing Piece Between "Server Is Up" and "Service Is Working" +A server can be reachable (`ping` works) but the service unavailable (port not listening). The port is where **infrastructure meets application** — it's the handoff point. Knowing common ports (22=SSH, 80=HTTP, 443=HTTPS, 3306=MySQL) lets you instantly correlate `ss -tulpn` output with expected services and spot misconfigurations. + +--- diff --git a/2026/day-16/check_number.sh b/2026/day-16/check_number.sh new file mode 100755 index 0000000000..1a4b522017 --- /dev/null +++ b/2026/day-16/check_number.sh @@ -0,0 +1,13 @@ +#!/bin/bash +# Day 16 - Task 4a: If-Else Conditions — Number Check +# This script takes a number and determines if it is positive, negative, or zero + +read -p "Enter a number: " NUM + +if [ "$NUM" -gt 0 ]; then + echo "$NUM is a positive number ✅" +elif [ "$NUM" -lt 0 ]; then + echo "$NUM is a negative number ❌" +else + echo "The number is zero 🔵" +fi diff --git a/2026/day-16/day-16-shell-scripting.md b/2026/day-16/day-16-shell-scripting.md new file mode 100644 index 0000000000..80f43f1b88 --- /dev/null +++ b/2026/day-16/day-16-shell-scripting.md @@ -0,0 +1,844 @@ +# Day 16 – Shell Scripting Basics + +**Date:** 2026-02-19 +**Author:** Rameez Ahmed +**Challenge:** Start your shell scripting journey — learn the fundamentals every script needs +**Reference:** [90DaysOfDevOps](https://github.com/LondheShubham153/90DaysOfDevOps) + +--- + +## 📋 Overview + +**Shell scripting** is the art of automating tasks on Linux by writing sequences of commands in a text file that the shell interpreter executes line-by-line. For a DevOps Engineer, shell scripts are the **first tool in your automation toolbox** — they glue together system commands, enable repeatable workflows, and form the backbone of CI/CD pipelines, provisioning scripts, and monitoring systems. + +> **🎯 Why Shell Scripting Matters for DevOps:** +> Every server interaction — from deploying applications to rotating logs — can be captured in a script and executed consistently across hundreds of machines. The difference between an operator and an engineer is **automation**, and that starts here. + +--- + +## 🏗️ How a Shell Script Executes + +Understanding the execution flow helps you debug scripts effectively: + +``` +┌──────────────────────────────────────────────────────────────────┐ +│ SHELL SCRIPT EXECUTION FLOW │ +└──────────────────────────────────────────────────────────────────┘ + + 📝 script.sh 🖥️ Terminal + ┌─────────────────────┐ + │ #!/bin/bash │ ──── ① Shebang tells the OS which + │ │ interpreter to use (/bin/bash) + │ NAME="Rameez" │ ──── ② Variables are stored in memory + │ │ + │ read -p "Input: " X │ ──── ③ read pauses and waits for + │ │ user input from stdin + │ if [ "$X" = "y" ]; │ ──── ④ Conditions are evaluated + │ then │ (exit code 0 = true) + │ echo "Yes!" │ ──── ⑤ Commands run sequentially + │ fi │ top to bottom + └─────────────────────┘ + + │ + ▼ + ┌─────────────────────────────────────────────┐ + │ EXECUTION SEQUENCE │ + │ │ + │ Step 1: OS reads shebang → launches bash │ + │ Step 2: bash reads file line-by-line │ + │ Step 3: Each line is parsed & executed │ + │ Step 4: Variables are replaced (expanded) │ + │ Step 5: Output goes to stdout/stderr │ + │ Step 6: Exit code returned (0 = success) │ + └─────────────────────────────────────────────┘ +``` + +--- + +## 🔑 Core Concepts at a Glance + +| Concept | What It Does | Syntax Example | +|---------|-------------|----------------| +| **Shebang** | Tells the OS which interpreter to use | `#!/bin/bash` | +| **Variables** | Store and reuse values | `NAME="Rameez"` | +| **echo** | Print text to standard output | `echo "Hello, $NAME"` | +| **read** | Accept user input from keyboard | `read -p "Enter: " VAR` | +| **if-else** | Conditional branching logic | `if [ cond ]; then ... fi` | +| **Exit Codes** | `0` = success, non-zero = failure | `echo $?` to check | +| **Comments** | Lines starting with `#` are ignored | `# This is a comment` | +| **chmod** | Change file permissions to make executable | `chmod +x script.sh` | + +--- + +## 🛠️ Challenge Tasks + +### Task 1: Your First Script — `hello.sh` + +The simplest possible script that teaches the two most fundamental concepts: the **shebang** and **echo**. + +#### 📄 Script Code + +```bash +#!/bin/bash +# Day 16 - Task 1: Your First Script +# This script prints a greeting message to the terminal + +echo "Hello, DevOps!" +``` + +#### ▶️ How to Run + +```bash +# Step 1: Make the script executable +chmod +x hello.sh + +# Step 2: Run the script +./hello.sh +``` + +#### 📤 Output + +``` +Hello, DevOps! +``` + +#### 🔬 Deep Dive: The Shebang (`#!/bin/bash`) + +The **shebang** (also called **hashbang**) is the very first line of a script. It tells the operating system which interpreter should execute the file. + +``` + #!/bin/bash + ││ └──────── Path to the interpreter binary + │└────────── ! (bang) + └─────────── # (hash/sharp) +``` + +**What happens if you remove the shebang?** + +| Scenario | Behavior | +|----------|----------| +| Running with `./hello.sh` | The system uses the **current shell** (could be `bash`, `zsh`, `sh`, `dash`, etc.) — this may produce unexpected results if the script uses bash-specific features | +| Running with `bash hello.sh` | Works fine because you explicitly told the OS to use `bash` | +| On a server with `sh` as default | Bash-specific syntax like `[[ ]]` or `(( ))` will **fail** | + +> **💡 Best Practice:** **Always include the shebang.** It makes your scripts portable and self-documenting. In production, `#!/bin/bash` is the standard, but for maximum portability use `#!/usr/bin/env bash`. + +--- + +### Task 2: Variables — `variables.sh` + +Variables are the building blocks of any script — they let you store data, pass configuration, and build dynamic commands. + +#### 📄 Script Code + +```bash +#!/bin/bash +# Day 16 - Task 2: Variables +# Demonstrates variable assignment and the difference between single and double quotes + +NAME="Rameez" +ROLE="DevOps Engineer" + +# Double quotes — variables are expanded (interpolated) +echo "Hello, I am $NAME and I am a $ROLE" + +# Single quotes — everything is treated as a literal string (no expansion) +echo 'Hello, I am $NAME and I am a $ROLE' +``` + +#### ▶️ How to Run + +```bash +chmod +x variables.sh +./variables.sh +``` + +#### 📤 Output + +``` +Hello, I am Rameez and I am a DevOps Engineer +Hello, I am $NAME and I am a $ROLE +``` + +#### 🔬 Deep Dive: Single Quotes vs Double Quotes + +This is one of the most common sources of confusion in shell scripting: + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ QUOTING BEHAVIOR IN BASH │ +├─────────────────────────────────────────────────────────────────┤ +│ │ +│ Double Quotes " " Single Quotes ' ' │ +│ ┌─────────────────────────┐ ┌─────────────────────┐ │ +│ │ • Variables EXPANDED │ │ • Everything LITERAL │ │ +│ │ • $NAME → "Rameez" │ │ • $NAME → "$NAME" │ │ +│ │ • Command sub works │ │ • No expansion │ │ +│ │ • Backslash escapes work │ │ • Nothing is special│ │ +│ │ • Backticks work │ │ • Safest quoting │ │ +│ └─────────────────────────┘ └─────────────────────┘ │ +│ │ +│ Example: Example: │ +│ echo "Hi $NAME" → Hi Rameez echo 'Hi $NAME' → Hi $NAME│ +│ echo "$(date)" → 2026-02-19... echo '$(date)' → $(date) │ +│ │ +└─────────────────────────────────────────────────────────────────┘ +``` + +| Quote Type | Variables | Commands | Special Chars | Use When | +|------------|-----------|----------|---------------|----------| +| **Double `" "`** | ✅ Expanded | ✅ `$(cmd)` works | ✅ `\n`, `\t` work | You need variable substitution | +| **Single `' '`** | ❌ Literal | ❌ Literal | ❌ All literal | You want exact text, no processing | +| **None** | ✅ Expanded | ✅ Works | ⚠️ Word splitting risk | Simple single-word values only | + +> **⚠️ Variable Assignment Rules:** +> - **No spaces** around `=`: `NAME="Rameez"` ✅ but `NAME = "Rameez"` ❌ +> - Variable names are **case-sensitive**: `$name` ≠ `$NAME` +> - By convention, use **UPPERCASE** for environment/global variables and **lowercase** for local ones + +--- + +### Task 3: User Input with `read` — `greet.sh` + +Interactive scripts that can accept user input at runtime are essential for building tools your team can use. + +#### 📄 Script Code + +```bash +#!/bin/bash +# Day 16 - Task 3: User Input with read +# This script asks for the user's name and favourite tool, then greets them + +read -p "Enter your name: " NAME +read -p "Enter your favourite tool: " TOOL + +echo "Hello $NAME, your favourite tool is $TOOL" +``` + +#### ▶️ How to Run + +```bash +chmod +x greet.sh +./greet.sh +``` + +#### 📤 Output + +``` +Enter your name: Rameez +Enter your favourite tool: Docker +Hello Rameez, your favourite tool is Docker +``` + +#### 🔬 Deep Dive: The `read` Command + +``` +┌──────────────────────────────────────────────────────────────┐ +│ read COMMAND OPTIONS │ +├─────────────┬────────────────────────────────────────────────┤ +│ Flag │ What It Does │ +├─────────────┼────────────────────────────────────────────────┤ +│ -p "text" │ Display a prompt before reading input │ +│ -s │ Silent mode (hide input) — good for passwords │ +│ -t 5 │ Timeout after 5 seconds │ +│ -n 1 │ Read only 1 character (no Enter needed) │ +│ -r │ Don't treat backslash as escape character │ +│ -a ARRAY │ Read into an array variable │ +└─────────────┴────────────────────────────────────────────────┘ +``` + +**Practical examples:** + +```bash +# Read a password (hidden input) +read -sp "Enter password: " PASSWORD +echo "" # New line since -s suppresses it + +# Read with a timeout (useful in automation) +read -t 10 -p "Continue? (y/n): " ANSWER + +# Read a single keypress +read -n 1 -p "Press any key to continue..." +``` + +> **💡 DevOps Tip:** In CI/CD pipelines, scripts usually receive input via **environment variables** or **command-line arguments** instead of `read`, since there's no human to type. Use `read` for interactive tools; use `$1`, `$2`, or `$ENV_VAR` for automated scripts. + +--- + +### Task 4: If-Else Conditions + +Conditional logic is the brain of your scripts — it lets them make decisions and respond to different situations. + +#### Task 4a: Number Checker — `check_number.sh` + +#### 📄 Script Code + +```bash +#!/bin/bash +# Day 16 - Task 4a: If-Else Conditions — Number Check +# This script takes a number and determines if it is positive, negative, or zero + +read -p "Enter a number: " NUM + +if [ "$NUM" -gt 0 ]; then + echo "$NUM is a positive number ✅" +elif [ "$NUM" -lt 0 ]; then + echo "$NUM is a negative number ❌" +else + echo "The number is zero 🔵" +fi +``` + +#### ▶️ How to Run + +```bash +chmod +x check_number.sh +./check_number.sh +``` + +#### 📤 Output (3 runs) + +``` +# Run 1: +Enter a number: 42 +42 is a positive number ✅ + +# Run 2: +Enter a number: -7 +-7 is a negative number ❌ + +# Run 3: +Enter a number: 0 +The number is zero 🔵 +``` + +--- + +#### Task 4b: File Checker — `file_check.sh` + +#### 📄 Script Code + +```bash +#!/bin/bash +# Day 16 - Task 4b: If-Else Conditions — File Check +# This script asks for a filename and checks whether it exists + +read -p "Enter a filename to check: " FILENAME + +if [ -f "$FILENAME" ]; then + echo "✅ File '$FILENAME' exists!" + echo " Size: $(du -h "$FILENAME" | cut -f1)" + echo " Last modified: $(stat -c '%y' "$FILENAME" | cut -d'.' -f1)" +else + echo "❌ File '$FILENAME' does NOT exist." +fi +``` + +#### ▶️ How to Run + +```bash +chmod +x file_check.sh +./file_check.sh +``` + +#### 📤 Output + +``` +# Checking an existing file: +Enter a filename to check: hello.sh +✅ File 'hello.sh' exists! + Size: 4.0K + Last modified: 2026-02-19 17:15:32 + +# Checking a non-existent file: +Enter a filename to check: ghost.txt +❌ File 'ghost.txt' does NOT exist. +``` + +#### 🔬 Deep Dive: If-Else Syntax & Test Operators + +**The anatomy of an if statement:** + +``` + if [ condition ]; then ← Opening (note the spaces inside [ ]) + command1 ← Runs if condition is TRUE (exit code 0) + command2 + elif [ condition2 ]; then ← Optional: additional condition + command3 + else ← Optional: fallback if nothing matched + command4 + fi ← Closing (fi = if backwards) +``` + +> **⚠️ Critical Syntax Rules:** +> - **Spaces inside `[ ]` are mandatory**: `[ "$X" -gt 0 ]` ✅ vs `["$X" -gt 0]` ❌ +> - **Quote your variables**: `[ "$NUM" -gt 0 ]` ✅ vs `[ $NUM -gt 0 ]` ❌ (breaks with empty input) +> - **Semicolon before `then`** (or put `then` on next line) + +**Common Test Operators:** + +| Category | Operator | Meaning | Example | +|----------|----------|---------|---------| +| **Numbers** | `-eq` | Equal | `[ "$a" -eq "$b" ]` | +| | `-ne` | Not equal | `[ "$a" -ne "$b" ]` | +| | `-gt` | Greater than | `[ "$a" -gt 10 ]` | +| | `-lt` | Less than | `[ "$a" -lt 10 ]` | +| | `-ge` | Greater or equal | `[ "$a" -ge 5 ]` | +| | `-le` | Less or equal | `[ "$a" -le 100 ]` | +| **Strings** | `=` | Equal | `[ "$a" = "yes" ]` | +| | `!=` | Not equal | `[ "$a" != "no" ]` | +| | `-z` | Is empty | `[ -z "$a" ]` | +| | `-n` | Is not empty | `[ -n "$a" ]` | +| **Files** | `-f` | File exists (regular) | `[ -f /etc/hosts ]` | +| | `-d` | Directory exists | `[ -d /var/log ]` | +| | `-r` | File is readable | `[ -r config.yml ]` | +| | `-w` | File is writable | `[ -w /tmp/output ]` | +| | `-x` | File is executable | `[ -x ./deploy.sh ]` | +| | `-s` | File is non-empty | `[ -s logfile.log ]` | +| **Logic** | `-a` or `&&` | AND | `[ cond1 ] && [ cond2 ]` | +| | `-o` or `\|\|` | OR | `[ cond1 ] \|\| [ cond2 ]` | +| | `!` | NOT | `[ ! -f file ]` | + +--- + +### Task 5: Combine It All — `server_check.sh` + +This script brings together everything: variables, `read`, and `if-else` — simulating a real-world DevOps tool that checks service health. + +#### 📄 Script Code + +```bash +#!/bin/bash +# Day 16 - Task 5: Combine It All — Server Status Checker +# This script combines variables, user input, and if-else logic +# to check whether a system service is active or not + +SERVICE="nginx" + +echo "============================================" +echo " 🖥️ Server Service Checker" +echo "============================================" +echo "" +echo "Service selected: $SERVICE" +echo "" + +read -p "Do you want to check the status of '$SERVICE'? (y/n): " CHOICE + +if [ "$CHOICE" = "y" ] || [ "$CHOICE" = "Y" ]; then + echo "" + echo "Checking status of '$SERVICE'..." + echo "--------------------------------------------" + + if systemctl is-active --quiet "$SERVICE"; then + echo "✅ Service '$SERVICE' is ACTIVE and running." + else + echo "❌ Service '$SERVICE' is NOT active." + fi + + echo "--------------------------------------------" + echo "" + echo "Full status output:" + systemctl status "$SERVICE" --no-pager 2>/dev/null || echo "(Could not retrieve full status)" +else + echo "" + echo "⏭️ Skipped." +fi +``` + +#### ▶️ How to Run + +```bash +chmod +x server_check.sh +./server_check.sh +``` + +#### 📤 Output (Service Active) + +``` +============================================ + 🖥️ Server Service Checker +============================================ + +Service selected: nginx + +Do you want to check the status of 'nginx'? (y/n): y + +Checking status of 'nginx'... +-------------------------------------------- +✅ Service 'nginx' is ACTIVE and running. +-------------------------------------------- + +Full status output: +● nginx.service - A high performance web server and reverse proxy server + Loaded: loaded (/lib/systemd/system/nginx.service; enabled) + Active: active (running) since Wed 2026-02-19 12:30:00 PKT + Main PID: 1234 (nginx) + Tasks: 2 + Memory: 5.6M + CPU: 32ms +``` + +#### 📤 Output (User Skips) + +``` +============================================ + 🖥️ Server Service Checker +============================================ + +Service selected: nginx + +Do you want to check the status of 'nginx'? (y/n): n + +⏭️ Skipped. +``` + +#### 🔬 Script Breakdown + +``` +┌────────────────────────────────────────────────────────────────────┐ +│ server_check.sh — FLOW DIAGRAM │ +└────────────────────────────────────────────────────────────────────┘ + + ┌──────────────────┐ + │ Start │ + └────────┬─────────┘ + │ + ▼ + ┌──────────────────┐ + │ SERVICE="nginx" │ ← Variable stores the service name + │ Display banner │ + └────────┬─────────┘ + │ + ▼ + ┌──────────────────┐ + │ read CHOICE │ ← User inputs y or n + └────────┬─────────┘ + │ + ┌────┴────┐ + │ y or Y? │ + └────┬────┘ + yes │ no + │ │ + ▼ ▼ + ┌────────────┐ ┌──────────┐ + │ systemctl │ │ "Skipped"│ + │ is-active? │ └──────────┘ + └──────┬─────┘ + yes │ no + │ │ + ▼ ▼ + ┌──────┐ ┌──────────┐ + │ACTIVE│ │NOT active │ + └──────┘ └──────────┘ + │ + ▼ + ┌──────────────────┐ + │ Show full status │ + │ (systemctl status)│ + └──────────────────┘ +``` + +--- + +## 📊 Script Summary Table + +| Script | Concepts Used | File | Purpose | +|--------|--------------|------|---------| +| `hello.sh` | Shebang, `echo` | Task 1 | Print a greeting message | +| `variables.sh` | Variables, quoting | Task 2 | Demonstrate variable expansion and quoting | +| `greet.sh` | `read`, variables | Task 3 | Interactive user greeting | +| `check_number.sh` | `read`, `if-elif-else` | Task 4a | Classify numbers as positive/negative/zero | +| `file_check.sh` | `read`, `if-else`, `-f` test | Task 4b | Check if a file exists | +| `server_check.sh` | Variables, `read`, `if-else`, `systemctl` | Task 5 | Check service status interactively | + +--- + +## 🧰 Essential Shell Scripting Command Reference + +### Script Execution + +| Action | Command | Example | +|--------|---------|---------| +| Make executable | `chmod +x` | `chmod +x deploy.sh` | +| Run with `./` | `./script.sh` | `./hello.sh` | +| Run with interpreter | `bash script.sh` | `bash hello.sh` | +| Run in debug mode | `bash -x script.sh` | `bash -x deploy.sh` | +| Check syntax only | `bash -n script.sh` | `bash -n deploy.sh` | + +### Variable Operations + +| Action | Syntax | Example | +|--------|--------|---------| +| Assign | `VAR=value` | `NAME="Rameez"` | +| Access | `$VAR` or `${VAR}` | `echo "$NAME"` | +| Default value | `${VAR:-default}` | `echo "${NAME:-Guest}"` | +| Command substitution | `$(command)` | `TODAY=$(date +%F)` | +| Arithmetic | `$((expression))` | `TOTAL=$((5 + 3))` | +| String length | `${#VAR}` | `echo "${#NAME}"` | +| Export to child processes | `export VAR` | `export PATH="/usr/local/bin:$PATH"` | + +### Input/Output + +| Action | Command | Example | +|--------|---------|---------| +| Print text | `echo` | `echo "Hello"` | +| Print formatted | `printf` | `printf "%-10s %s\n" "Name:" "$NAME"` | +| Read input | `read -p` | `read -p "Enter: " VAR` | +| Read silently | `read -sp` | `read -sp "Password: " PASS` | +| Redirect stdout | `>` or `>>` | `echo "log" >> file.log` | +| Redirect stderr | `2>` | `cmd 2> errors.log` | +| Redirect both | `&>` | `cmd &> output.log` | + +--- + +## 🔄 Real-World DevOps Scenarios + +### Scenario 1: Automated Health Check Script + +```bash +#!/bin/bash +# Check multiple services and report their status + +SERVICES=("nginx" "sshd" "docker" "cron") + +echo "==============================" +echo " SERVICE HEALTH CHECK REPORT" +echo " $(date)" +echo "==============================" + +for SVC in "${SERVICES[@]}"; do + if systemctl is-active --quiet "$SVC" 2>/dev/null; then + echo " ✅ $SVC — RUNNING" + else + echo " ❌ $SVC — DOWN" + fi +done +``` + +### Scenario 2: Deployment Pre-flight Checker + +```bash +#!/bin/bash +# Verify prerequisites before deploying an application + +echo "🔍 Running pre-flight checks..." + +CHECKS_PASSED=true + +# Check if Docker is installed +if ! command -v docker &>/dev/null; then + echo " ❌ Docker is not installed" + CHECKS_PASSED=false +else + echo " ✅ Docker: $(docker --version | cut -d' ' -f3)" +fi + +# Check if config file exists +if [ ! -f "./config.yml" ]; then + echo " ❌ config.yml not found" + CHECKS_PASSED=false +else + echo " ✅ config.yml found" +fi + +# Check disk space (need at least 1GB free) +FREE_SPACE=$(df / --output=avail -BG | tail -1 | tr -d ' G') +if [ "$FREE_SPACE" -lt 1 ]; then + echo " ❌ Insufficient disk space: ${FREE_SPACE}G" + CHECKS_PASSED=false +else + echo " ✅ Disk space: ${FREE_SPACE}G available" +fi + +if [ "$CHECKS_PASSED" = true ]; then + echo "" + echo "✅ All checks passed — ready to deploy!" +else + echo "" + echo "❌ Pre-flight checks FAILED — fix issues before deploying." + exit 1 +fi +``` + +### Scenario 3: Log Rotation Script + +```bash +#!/bin/bash +# Rotate application logs — keep only the last 7 days + +LOG_DIR="/var/log/myapp" +RETENTION_DAYS=7 + +echo "🔄 Rotating logs in $LOG_DIR (keeping last $RETENTION_DAYS days)..." + +if [ -d "$LOG_DIR" ]; then + DELETED=$(find "$LOG_DIR" -name "*.log" -mtime +$RETENTION_DAYS -delete -print | wc -l) + echo "✅ Deleted $DELETED old log files." +else + echo "❌ Log directory $LOG_DIR does not exist." +fi +``` + +--- + +## 🆚 Script Execution Methods Compared + +| Method | Command | Shebang Used? | Needs `chmod +x`? | Runs In | +|--------|---------|:------------:|:-----------------:|---------| +| Direct execution | `./script.sh` | ✅ Yes | ✅ Yes | Subshell | +| Explicit interpreter | `bash script.sh` | ❌ Ignored | ❌ No | Subshell | +| Source (dot) | `. script.sh` | ❌ Ignored | ❌ No | **Current** shell | +| Source | `source script.sh` | ❌ Ignored | ❌ No | **Current** shell | + +> **💡 Key Difference:** `./script.sh` and `bash script.sh` run in a **subshell** — variables set inside the script disappear when it finishes. `source script.sh` runs in the **current** shell — variables persist. This is why you use `source ~/.bashrc` to reload your configuration. + +--- + +## 🧹 Script Writing Best Practices + +``` +┌──────────────────────────────────────────────────────────────┐ +│ SHELL SCRIPTING BEST PRACTICES │ +├──────────────────────────────────────────────────────────────┤ +│ │ +│ 1. #!/bin/bash Always include the shebang │ +│ │ +│ 2. set -euo pipefail Exit on errors, undefined vars, │ +│ and pipe failures (production) │ +│ │ +│ 3. "$VARIABLE" Always quote your variables │ +│ │ +│ 4. # Comments Explain WHY, not WHAT │ +│ │ +│ 5. shellcheck Lint your scripts before deploy │ +│ │ +│ 6. Meaningful names SERVICE_NAME > SN │ +│ │ +│ 7. Exit codes Use exit 0 (success) / exit 1 │ +│ │ +│ 8. Error handling Check if commands succeed │ +│ │ +│ 9. DRY principle Use functions for repeated logic │ +│ │ +│ 10. Test on staging Never deploy untested scripts │ +│ │ +└──────────────────────────────────────────────────────────────┘ +``` + +### The `set -euo pipefail` Safety Net + +For production scripts, always add this near the top: + +```bash +#!/bin/bash +set -euo pipefail + +# -e → Exit immediately if any command returns non-zero +# -u → Treat unset variables as an error +# -o pipefail → A pipeline fails if ANY command in it fails +``` + +| Flag | Without It | With It | +|------|-----------|---------| +| `-e` | Script continues even after errors | Script stops at first error | +| `-u` | Unset variables silently become empty | Error raised for unset variables | +| `-o pipefail` | `cmd1 | cmd2` succeeds if `cmd2` succeeds | Fails if `cmd1` OR `cmd2` fails | + +--- + +## 🔍 Troubleshooting Guide + +| Issue | Cause | Solution | +|-------|-------|----------| +| `Permission denied` when running `./script.sh` | Script is not executable | `chmod +x script.sh` | +| `command not found` when running `./script.sh` | Missing shebang or wrong interpreter path | Add `#!/bin/bash` as line 1 | +| `unexpected operator` error | Using bash syntax with `sh` | Ensure shebang is `#!/bin/bash`, not `#!/bin/sh` | +| Variables not expanding | Using single quotes `' '` | Switch to double quotes `" "` for expansion | +| `unary operator expected` | Variable is empty in `[ ]` | Quote your variable: `[ "$VAR" -gt 0 ]` | +| `integer expression expected` | Non-numeric input to `-gt`, `-lt`, etc. | Validate input before comparison | +| Script works interactively but fails in cron | Different `PATH` in cron environment | Use full paths: `/usr/bin/echo` instead of `echo` | +| `\r: command not found` | Script created on Windows (CRLF line endings) | Convert: `dos2unix script.sh` or `sed -i 's/\r$//' script.sh` | +| `read` not waiting for input in pipeline | stdin is consumed by the pipeline | Use `read < /dev/tty` for interactive input | +| Variables lost after script finishes | Script runs in a subshell | Use `source script.sh` to run in current shell | + +--- + +## 🐛 Debugging Your Scripts + +When a script doesn't behave as expected, use these techniques: + +```bash +# Method 1: Run in debug mode (prints each command before executing) +bash -x ./script.sh + +# Method 2: Add debug mode to specific sections of your script +set -x # Turn on debugging +# ... commands to debug ... +set +x # Turn off debugging + +# Method 3: Check syntax without running +bash -n ./script.sh + +# Method 4: Use shellcheck (static analysis linter) +shellcheck ./script.sh +``` + +**Debug output example (bash -x):** + +``` ++ NAME=Rameez ++ ROLE='DevOps Engineer' ++ echo 'Hello, I am Rameez and I am a DevOps Engineer' +Hello, I am Rameez and I am a DevOps Engineer ++ echo 'Hello, I am $NAME and I am a $ROLE' +Hello, I am $NAME and I am a $ROLE +``` + +> **💡 The `+` prefix** shows each command after variable expansion, letting you see exactly what bash is executing. This is invaluable for finding bugs in complex scripts. + +--- + +## 💡 What I Learned + +### 1. The Shebang Is Not Just a Formality — It's a Contract +The `#!/bin/bash` line defines which interpreter runs your script. Without it, your script becomes **non-portable** — it might work on your machine with `bash` as the default shell, but fail on a server using `dash` or `sh`. In DevOps, where scripts run across different environments (local, CI runners, production servers, containers), the shebang guarantees consistent behavior. + +### 2. Quoting Variables Is a Non-Negotiable Habit +The difference between `$VAR` and `"$VAR"` can be the difference between a working deploy and a catastrophic bug. Unquoted variables undergo **word splitting** — if `FILE="my report.txt"`, then `rm $FILE` tries to delete TWO files (`my` and `report.txt`), while `rm "$FILE"` correctly handles the space. Always quote. Always. + +### 3. Shell Scripts Are the Gateway to Infrastructure as Code +Every Ansible playbook, Terraform provisioner, Docker entrypoint, and CI/CD pipeline step ultimately calls shell commands. Understanding `if-else`, `read`, and `systemctl` at this level makes you fluent in the language that every DevOps tool speaks underneath. These basics don't just teach scripting — they teach **systems thinking**. + +--- + +## 📁 Files Created + +``` +day-16/ +├── README.md # Task requirements +├── day-16-shell-scripting.md # This documentation file +├── hello.sh # Task 1: First script +├── variables.sh # Task 2: Variables and quoting +├── greet.sh # Task 3: User input with read +├── check_number.sh # Task 4a: Number checker +├── file_check.sh # Task 4b: File existence checker +└── server_check.sh # Task 5: Service status checker +``` + +--- + +## 🚀 What's Next? + +Shell scripting builds progressively. Here's the learning path ahead: + +``` + Day 16 (TODAY) Day 17+ Day 18+ + ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ + │ ✅ Shebang │ │ • Loops │ │ • Functions │ + │ ✅ Variables │ ───► │ • for/while │ ────► │ • Arrays │ + │ ✅ echo/read │ │ • Case stmt │ │ • Error handling│ + │ ✅ If-Else │ │ • Arguments │ │ • Cron jobs │ + └───────────────┘ └───────────────┘ └───────────────┘ +``` + +--- diff --git a/2026/day-16/file_check.sh b/2026/day-16/file_check.sh new file mode 100755 index 0000000000..15d8181df2 --- /dev/null +++ b/2026/day-16/file_check.sh @@ -0,0 +1,13 @@ +#!/bin/bash +# Day 16 - Task 4b: If-Else Conditions — File Check +# This script asks for a filename and checks whether it exists + +read -p "Enter a filename to check: " FILENAME + +if [ -f "$FILENAME" ]; then + echo "✅ File '$FILENAME' exists!" + echo " Size: $(du -h "$FILENAME" | cut -f1)" + echo " Last modified: $(stat -c '%y' "$FILENAME" | cut -d'.' -f1)" +else + echo "❌ File '$FILENAME' does NOT exist." +fi diff --git a/2026/day-16/greet.sh b/2026/day-16/greet.sh new file mode 100755 index 0000000000..28718b5875 --- /dev/null +++ b/2026/day-16/greet.sh @@ -0,0 +1,8 @@ +#!/bin/bash +# Day 16 - Task 3: User Input with read +# This script asks for the user's name and favourite tool, then greets them + +read -p "Enter your name: " NAME +read -p "Enter your favourite tool: " TOOL + +echo "Hello $NAME, your favourite tool is $TOOL" diff --git a/2026/day-16/hello.sh b/2026/day-16/hello.sh new file mode 100755 index 0000000000..f6c296a09a --- /dev/null +++ b/2026/day-16/hello.sh @@ -0,0 +1,5 @@ +#!/bin/bash +# Day 16 - Task 1: Your First Script +# This script prints a greeting message to the terminal + +echo "Hello, DevOps!" diff --git a/2026/day-16/server_check.sh b/2026/day-16/server_check.sh new file mode 100755 index 0000000000..df3cda124a --- /dev/null +++ b/2026/day-16/server_check.sh @@ -0,0 +1,35 @@ +#!/bin/bash +# Day 16 - Task 5: Combine It All — Server Status Checker +# This script combines variables, user input, and if-else logic +# to check whether a system service is active or not + +SERVICE="nginx" + +echo "============================================" +echo " 🖥️ Server Service Checker" +echo "============================================" +echo "" +echo "Service selected: $SERVICE" +echo "" + +read -p "Do you want to check the status of '$SERVICE'? (y/n): " CHOICE + +if [ "$CHOICE" = "y" ] || [ "$CHOICE" = "Y" ]; then + echo "" + echo "Checking status of '$SERVICE'..." + echo "--------------------------------------------" + + if systemctl is-active --quiet "$SERVICE"; then + echo "✅ Service '$SERVICE' is ACTIVE and running." + else + echo "❌ Service '$SERVICE' is NOT active." + fi + + echo "--------------------------------------------" + echo "" + echo "Full status output:" + systemctl status "$SERVICE" --no-pager 2>/dev/null || echo "(Could not retrieve full status)" +else + echo "" + echo "⏭️ Skipped." +fi diff --git a/2026/day-16/variables.sh b/2026/day-16/variables.sh new file mode 100755 index 0000000000..096a2161d6 --- /dev/null +++ b/2026/day-16/variables.sh @@ -0,0 +1,12 @@ +#!/bin/bash +# Day 16 - Task 2: Variables +# Demonstrates variable assignment and the difference between single and double quotes + +NAME="Rameez" +ROLE="DevOps Engineer" + +# Double quotes — variables are expanded (interpolated) +echo "Hello, I am $NAME and I am a $ROLE" + +# Single quotes — everything is treated as a literal string (no expansion) +echo 'Hello, I am $NAME and I am a $ROLE'