Index your objects
The purpose of this project is to allow using cheap Single Board Computers with one or two cheap HDDs each to store important data. No RAID, it only works well with expensive disks and still has a single point of failure in the controller, and is difficult to recover. No NAS/NFS; doing a cluster is too difficult. HTTP-based object store is the way to go.
The goal is not to try to replicate POSIX/NFS but to store WORM large files with basic metadata in a way that is better than a POSIX filesystem.
Inspired by projects like:
Consume S3 API(s) (from MinIO or the like) and expose a rich metadata store.
pip3 install https://github.com/ctengel/objectindex/archive/refs/heads/main.zip
There are then a few different ways to use this:
- RESTful API:
FLASK_APP=obj_idx.api OBJIDX_SETTINGS=/path/to/api.cfg flask run --host=0.0.0.0- need simpler-objects running
- need postgres running and setup
- see
OBJIDX_SETTINGS=/path/to/api.cfg python3 -m obj_idx.db_create
- see
- need API config file (see below)
- GUI:
FLASK_APP=obj_idx.gui OBJIDX_GUI_SETTINGS=/path/to/gui.cfg flask run --port 5001 --host=0.0.0.0- need GUI config file (see below)
- CLI client:
obj-idx-client
Hardware and such:
- Raspberry Pi 3B, 3B+, 400
- starting specifically with 3B+
- tuning may be needed for Pis older than 4/400
- External USB hard drive with SMR
- note that HDDs like this don't play well with having additional USB devices plugged in like an SSD; if you want to do this you will need to have an extra power source like a USB hub
- ext4 format
- strongly considering xfs
- standalone/non erasure
- note that single node single drive MinIO has been deprecated in late 2022 - single drive erasure coding has been introduced so using that now
- 32GB mini SDHC
- keep the swap here; putting on USB just overloads USB power/traffic
- Download
2022-04-04-raspios-bullseye-arm64-lite.img.xzor similar from https://www.raspberrypi.com/software/operating-systems/ xzcat 2022-04-04-raspios-bullseye-arm64-lite.img.xz | sudo dd of=/dev/sda bs=4096
-
Boot
-
sudo raspi-config
- ssh
- hostname
- disable autologin
- locale
- handle wifi killswitch?
- etc
-
/etc/dhcpcd.confinterface eth0 static ip_address=192.168.1.254/24 static routers=192.168.1.1 static domain_name_servers=192.168.1.1 -
sudo apt update; sudo apt upgrade -
sudo parted -a optimal /dev/sdX$ sudo parted -a optimal /dev/sdX GNU Parted 3.4 Using /dev/sdX Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) help ... (parted) mklabel New disk label type? gpt Warning: The existing disk label on /dev/sdb will be destroyed and all data on this disk will be lost. Do you want to continue? Yes/No? y (parted) mkpart Partition name? []? ... File system type? [ext2]? ext4 Start? 0% End? 100% (parted) print Model: ... Disk /dev/sdb: 2000GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 2000GB 2000GB ext4 ... (parted) quit Model: Seagate BUP Portable (scsi) Disk /dev/sda: 5001GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 5001GB 5001GB ext4 obj1data -
sudo mkfs.ext4 /dev/sda1 -
sudo mkdir /mnt/obj1data -
sudo blkid -s PARTUUID /dev/sda1 -
/etc/fstab:PARTUUID= /mnt/obj1data ext4 defaults,noatime 0 2- set noauto to prevent attempt to mount at boot, if swapping removable drives
-
sudo useradd -mU minio
- alternatively
groupadd -g 1234 minio; useradd -m -u 1234 -g 1234 miniomay be used to set a certain UID/GID userdel -r miniocan be used to uninstall`
sudo chown minio:minio /mnt/obj1datasudo apt install screen
We need to periodically monitor and tune hardware:
/usr/bin/vcgencmd measure_temp- see https://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/
echo 1440 | sudo tee /sys/block/sda/device/timeoutecho 720 | sudo tee /sys/block/sda/device/eh_timeout- see
/etc/sysctl.d
- check SMART for the disk
sudo smartctl -a /dev/sda - other articles -
- https://unix.stackexchange.com/questions/541463/how-to-prevent-disk-i-o-timeouts-which-cause-disks-to-disconnect-and-data-corrup
- https://www.snia.org/sites/default/files/SDC15_presentations/smr/HannesReinecke_Strategies_for_running_unmodified_FS_SMR.pdf
- https://www.usenix.org/system/files/login/articles/login_summer17_03_aghayev.pdf
sudo shutdown -r now; exit
wget https://dl.min.io/server/minio/release/linux-arm64/minio- alternatively
GO111MODULE=on go install github.com/minio/minio@latestwhich will compile and install to~/go/bin/minio - see the official minio docs for more
- alternatively
wget https://dl.min.io/server/mc/release/linux-arm64/mcchmod a+x minio mcMINIO_ROOT_USER=minio MINIO_ROOT_PASSWORD=password /home/minio/minio server /mnt/obj1data --address 0.0.0.0:9000 --console-address 0.0.0.0:9001- can be done as a script like
./start.shand run in a screen session
- can be done as a script like
- actually setup buckets, users, replication, etc
./mc alias set xyz http://0.0.0.0:9000 minio password- for more info see the mc docs
./mc admin info minio./mc admin user add minio user password./mc mb minio/bucket- grant access from user to bucket
vim userbucketpolicy.json- put bucket name(s) in there./mc admin policy add minio BUCKET-policy userbucketpolicy.json./mc admin policy set minio BUCKET-policy user=USER./mc admin user info minio christest
./mc update && ./mc admin update xyz/
$ systemctl list-units | grep '/path/to/objectstore' | awk '{ print $1 }'
/etc/systemd/system/minio.service:
[Unit]
Description=MinIO Object Storage Service
After=network-online.target objectstoremountpoint.mount
[Service]
ExecStart=/home/minio/start.sh
WorkingDirectory=/home/minio
User=minio
Group=minio
[Install]
WantedBy=multi-user.target
$ sudo systemctl start minio
$ sudo systemctl status minio
$ sudo systemctl enable minio
Some info on getting PostgreSQL running on Fedora:
- https://developer.fedoraproject.org/tech/database/postgresql/about.html
- https://docs.fedoraproject.org/en-US/quick-docs/postgresql/
/usr/share/doc/postgresql/README.rpm-dist
Initial steps to be performed as a sudoer:
sudo dnf install postgresql-server
sudo postgresql-setup --initdb
sudo systemctl start postgresql
sudo su -c "createuser -P USER" postgres # note you will be prompted to create a password
sudo su -c "createdb -O USER DB" postgres
Note also that modifying /var/lib/pgsql/data/pg_hba.conf to include scram-sha-256 instead of ident etc may be needed.
Following steps to be run as user who will run the API.
OBJIDX_SETTINGS=../samp.cfg python3 -m obj_idx.db_create
pg_dump --schema-only DB > schema.sql
The db_create.py script will empty a database and create tables in the schema, and uses the same config file as the web app.
Moving
update object set bucket='new' where bucket='old';
Deleting
delete from file using object where file.obj_uuid=object.uuid and object.bucket='old';
objidx1d=> delete from object where bucket='old';
DEBUG = True
SQLALCHEMY_DATABASE_URI = 'postgresql:///objidx'
SQLALCHEMY_TRACK_MODIFICATIONS = False
OBJIDX_S3 = 'http://user:pass@localhost:9000/'
OBJIDX_BUCKETS = ['bucket1']
OBJIDX_S3is a special URL for S3OBJIDX_BUCKETSis a list of buckets that may be used.- The rest are standard Flask and sqlalchemy options
DEBUG = True
OBJIDX_URL="http://127.0.0.1:5000/" # change if running on a different host
OBJIDX_AUTH="user" # currently just username as no auth yet at API level, ideally pass thru in fut
Failed upload must be first cleared by PUT/PATCHing the object /object/<object-uuid>/ with {"deleted": true} to signify that upload has stopped.
Essentially, the lifecycle state machine of an object looks something like this:
- Initial POST upload - new status (completed: false; deleted: false) - assumed upload to object store to initiate shortly - subsequent upload attempts will fail
- Successful object upload
- PUT object completed=True signifying completion - normal status (completed: true, deleted: false)
The initial client may retry step 2 as many times as needed; however to start from scratch the object needs to be put in "retry" mode (completed: false, deleted: true) as described above.
Finally, once an object is in normal state, the object may be noted as permenantly deleted intentionally (i.e. so no option/desire for retry) by putting it in deleted state (completed: true, deleted: true) - putting it in this state doesn't actually delete it from object store though.