feat: new post
This commit is contained in:
parent
073d2b4c1e
commit
2fd5368635
2 changed files with 243 additions and 1 deletions
242
content/glusterfs.md
Normal file
242
content/glusterfs.md
Normal file
|
@ -0,0 +1,242 @@
|
|||
+++
|
||||
title = "Glusterfs"
|
||||
date = "2023-12-13 16:27:07-08:00"
|
||||
[taxonomies]
|
||||
tags = ["linux", "nix", "ramble"]
|
||||
+++
|
||||
|
||||
# What's a Glusterfs?
|
||||
|
||||
Glusterfs is a network filesystem with many features, but the important ones
|
||||
here are it's ability to live on top of another filesystem, and offer high
|
||||
availability. If you have used SSHFS, it's quite similar in concept, giving you
|
||||
a "fake" filesystem from a remote machine, and as a user, you can use it just
|
||||
like normal without caring about the details of where the files are actually
|
||||
stored, except "over there I guess". Glusterfs unlike SSHFS, can be stored
|
||||
across multiple machines similar to network RAID. If one machine goes down, the
|
||||
data is still all there and well.
|
||||
|
||||
# Why even bother?
|
||||
|
||||
A few years ago I decided that I was tired of managing docker services per
|
||||
machine and wanted them in a swarm. No more thinking! If a machine goes down,
|
||||
the service is either still up (already replicated across servers like this
|
||||
blog), or will come up on another server once it sees the service isn't alive.
|
||||
This is well and good until you need the SAN to go down. Now all of the data is
|
||||
missing, and the servers don't know, and you basically have to kick the entire
|
||||
cluster over to get it back alive. Not exactly ideal to say the least.
|
||||
|
||||
## Side rant. Feel free to skip if you only care about the tech bits.
|
||||
|
||||
While ZFS has kept my data very secure over the ages, it can't always prevent
|
||||
machine oddity. I have had strange issues such as Ryzen bugs that could lock up
|
||||
machines at idle, a still not figured out random hang on networking (despite
|
||||
changing 80% of the machine, including all disks, operating system, and network
|
||||
cards) before it comes back 10 seconds later, and so on. As much as I always
|
||||
want to have a reliable machine, updates will require service restarts, reboots
|
||||
need done, and honestly, I'm tired of having to babysit computers. Docker swarm
|
||||
and NixOS are in my life because I don't want to babysit, but solve problems
|
||||
once, and be done with it. Storage stability was the next nail to hit, despite
|
||||
it being arguably a small problem, it still reminded me that computers exist
|
||||
when I wasn't in the mood for them to exist.
|
||||
|
||||
# Why Glusterfs as opposed to Ceph or anything else?
|
||||
|
||||
Glusterfs sits on top of a filesystem. This is the feature that took me to it
|
||||
over anything else. I have trusted my data to ZFS for many years, and have done
|
||||
countless things that should have cost me data, including "oops, I deleted 2TB
|
||||
of data on the wrong machine", and having to force power off machines (usually
|
||||
SystemD reasons), and all of my data is safe. The very few things it couldn't
|
||||
save me from, it will happily tell me where there's corruption and I can replace
|
||||
the limited data from a backup. With all of that said, Glusterfs happily lives
|
||||
on top of ZFS, even letting me use datasets just as I have been for ages. It
|
||||
does however let me expand over several machines by using Glusterfs. There's a
|
||||
ton of modes to Glusterfs much as any "RAID software", but I'm sticking to
|
||||
effectively a mirror (RAID 1) in essence. Let's look at the hardware setup to
|
||||
explain this a bit better.
|
||||
|
||||
# The hardware
|
||||
|
||||
planex
|
||||
- Ryzen 5700
|
||||
- 32GB RAM
|
||||
- 2x16TB Seagate Exos
|
||||
- 2x1TB Crucial MX500
|
||||
|
||||
```
|
||||
pool
|
||||
--------------------------
|
||||
exos
|
||||
mirror-0
|
||||
wwn-0x5000c500db2f91e8
|
||||
wwn-0x5000c500db2f6413
|
||||
special
|
||||
mirror-1
|
||||
wwn-0x500a0751e5b141ca
|
||||
wwn-0x500a0751e5aff797
|
||||
--------------------------
|
||||
```
|
||||
|
||||
morbo
|
||||
- Ryzen 2700
|
||||
- 32GB RAM
|
||||
- 5x3TB Western Digital Red
|
||||
- 1x10TB Western Digital (replaced a red when it died)
|
||||
- 2x500GB Crucial MX500
|
||||
|
||||
```
|
||||
red
|
||||
raidz2-0
|
||||
ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N3EVYXPT
|
||||
ata-WDC_WD100EMAZ-00WJTA0_1EG9UBBN
|
||||
ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N6ARC4SV
|
||||
ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N6ARCZ43
|
||||
ata-WDC_WD30EFRX-68N32N0_WD-WCC7K2KU0FUR
|
||||
ata-WDC_WD30EFRX-68N32N0_WD-WCC7K7FD8T6K
|
||||
special
|
||||
mirror-2
|
||||
ata-CT500MX500SSD1_1904E1E57733-part2
|
||||
ata-CT500MX500SSD1_2005E286AD8B-part2
|
||||
logs
|
||||
mirror-1
|
||||
ata-CT500MX500SSD1_1904E1E57733-part1
|
||||
ata-CT500MX500SSD1_2005E286AD8B-part1
|
||||
--------------------------------------------
|
||||
```
|
||||
|
||||
kif
|
||||
- Intel i3 4170
|
||||
- 8GB RAM
|
||||
- 2x256GB Inland SSD
|
||||
|
||||
```
|
||||
pool
|
||||
-------------------------------
|
||||
inland
|
||||
mirror-0
|
||||
ata-SATA_SSD_22082224000061
|
||||
ata-SATA_SSD_22082224000174
|
||||
-------------------------------
|
||||
```
|
||||
|
||||
### Notes
|
||||
|
||||
These machines are a bit different in terms of storage layout. Morbo/Planex both
|
||||
actually store decent amounts of data, and kif is there just to help validate
|
||||
things, so it doesn't get a lot of anything. We'll see why later. Would having
|
||||
Morbo/Planex both have identical disk layouts increase performance? Yes, but so
|
||||
would SSD's, for all of the data. Tradeoffs.
|
||||
|
||||
# ZFS setup
|
||||
|
||||
I decided to make my setup simpler on all of my systems, and just keep the mount
|
||||
points for glusterfs the same. On each system, I created a dataset named
|
||||
`gluster` and set it's mountpoint to `/mnt/gluster`. This makes it a ton easier
|
||||
to not remember which machine has data where, and keep things streamlined. It
|
||||
may look something like this.
|
||||
|
||||
```bash
|
||||
zfs create pool/gluster
|
||||
zfs set mountpoint=/mnt/gluster
|
||||
```
|
||||
|
||||
If you have one disk, or just want everything on gluster, you could just mount
|
||||
the entire drive/pool to somewhere you'll remember, but I find it most simple to
|
||||
use datasets, and I have to migrate data from outside of gluster on the same
|
||||
array to inside of gluster. That's it for ZFS specific things.
|
||||
|
||||
# Creating a gluster storage pool
|
||||
|
||||
```bash
|
||||
gluster volume create media replica 2 arbiter 1 planex:/mnt/gluster/media morbo:/mnt/gluster/media kif:/mnt/gluster/media force
|
||||
```
|
||||
|
||||
This may look like a blob of text that means nothing, so let's look at what it
|
||||
does.
|
||||
|
||||
```bash
|
||||
# Tells gluster that we want to make a volume named "media"
|
||||
gluster volume create media
|
||||
|
||||
# Replicat 2 arbiter 1 tells gluster to use the first 2 servers to store the
|
||||
# full data in a mirror (replicate) and set the last as an arbiter. This acts
|
||||
# as a tie breaker for the case that anything ever disagrees, and you
|
||||
# need a source of truth. It costs VERY little data to store this.
|
||||
replica 2 arbiter 1
|
||||
|
||||
# The server name, and the path that we are using to store data on them
|
||||
planex:/mnt/gluster/media
|
||||
morbo:/mnt/gluster/media
|
||||
kif:/mnt/gluster/media
|
||||
|
||||
# Normally you want gluster to create it's own directory. When we use datasets,
|
||||
# the folder will already exist. This is something you should understand can
|
||||
# cause issues if you point it at the wrong place, so check first
|
||||
force
|
||||
```
|
||||
|
||||
If all goes well, you can start the volume with
|
||||
|
||||
```bash
|
||||
gluster volume start media
|
||||
```
|
||||
|
||||
You'll want to check the status once it's started, and it should look something
|
||||
like this.
|
||||
|
||||
```bash
|
||||
Status of volume: media
|
||||
Gluster process TCP Port RDMA Port Online Pid
|
||||
------------------------------------------------------------------------------
|
||||
Brick planex:/mnt/gluster/media 57715 0 Y 1009102
|
||||
Brick morbo:/mnt/gluster/media 57485 0 Y 1530585
|
||||
Brick kif:/mnt/gluster/media 54466 0 Y 1015000
|
||||
Self-heal Daemon on localhost N/A N/A Y 1009134
|
||||
Self-heal Daemon on kif N/A N/A Y 1015144
|
||||
Self-heal Daemon on morbo N/A N/A Y 1854760
|
||||
|
||||
Task Status of Volume media
|
||||
------------------------------------------------------------------------------
|
||||
```
|
||||
|
||||
With that taken care of, you can now mount your Gluster volume on any machine
|
||||
that you need! Just follow the normal instructions for your platform to install
|
||||
Gluster as it will be different for all of them. On NixOS at the time of
|
||||
writing, I'm using this to manage my Glusterfs for my docker swarm for any
|
||||
machine hosting storage.
|
||||
<https://git.kdb424.xyz/kdb424/nixFlake/src/commit/5a1c902d0233af2302f28ba30de4fec23ddaaac9/common/networking/gluster.nix>
|
||||
|
||||
# Using gluster volumes
|
||||
|
||||
Once a volume is started, you can mount it pointing at any machine that has data
|
||||
in the volume. In my case I can mount from planex/morbo/kif, and even if one
|
||||
goes down, the data is still served. You can treat this mount identically to if
|
||||
you were storing files locally, or over NFS/SSHFS, and any data stored on it
|
||||
will be replicated, and left high availability if a server needs to go down for
|
||||
maintenance or if it has issues. This provides a bit of a backup (in the same
|
||||
way that a RAID mirror does, never rely on online machines for a full backup),
|
||||
so this could not only let you have higher uptime on data, but if you have data
|
||||
replication on a schedule for a backup to a machine that's always on, this would
|
||||
do that in real time, which is a nice side effect.
|
||||
|
||||
# Now what?
|
||||
|
||||
With my docker swarm being able to be served without interruption from odd
|
||||
quirks, and it replacing my need to ZFS send/recv backups (on live machines,
|
||||
please have a cold store backup in a fire box if you care about your data,
|
||||
along with an off site backup), this lets me continue to forget that computers
|
||||
exist so I can focus on things I want to work on, like eventually setting up
|
||||
email alerts for ZFS scrubs, or
|
||||
[S.M.A.R.T.](https://en.wikipedia.org/wiki/Self-Monitoring,_Analysis_and_Reporting_Technology)
|
||||
scans with any drive warnings, I can continue to mostly forget about the
|
||||
details, and stay focused on the problems that are fun to solve. Yes, I could
|
||||
host my data elsewhere, but even ignoring the insane cost that I won't pay, I
|
||||
get to actually own my data, and not have a company creeping on things. Just
|
||||
because I have nothing to hide doesn't mean I leave my door unlocked.
|
||||
|
||||
### Obligatory "things I say I won't do, but probably will later"
|
||||
|
||||
- Dual network paths. Network switch or cable can knock machines offline.
|
||||
- Dual routers! Router upgrades always take too long. 5 minutes offline isn't
|
||||
acceptable these days!
|
||||
- Discover the true power of TempleOS.
|
|
@ -59,7 +59,7 @@ for security, so you can run `direnv allow .` in the directory once and it will
|
|||
be allowed to load from that point on when you `cd` into the directory, and
|
||||
unload when you leave.
|
||||
|
||||
## After throughts
|
||||
## After thoughts
|
||||
|
||||
Not only does this allow you to keep your system cleaner by keeping env vars and
|
||||
packages out of the system and user's packages, it allows you to keep that
|
||||
|
|
Loading…
Add table
Reference in a new issue