This change updates the Ceph images to 18.2.2 images patched with a
fix for https://tracker.ceph.com/issues/63684. It also reverts the
package repository in the deployment scripts to use the debian-reef
directory on download.ceph.com instead of debian-18.2.1. The issue
with the repo that prompted the previous change to debian-18.2.1
has been resolved and the more generic debian-reef directory may
now be used again.
Change-Id: I85be0cfa73f752019fc3689887dbfd36cec3f6b2
In some cases when OSD metadata disks are reused and redeployed,
lvcreate can fail to create a DB or WAL volume because it overlaps
an old, deleted volume on the same disk whose signature still
exists at the offsets that trigger detection and abort the LV
creation process when the user is asked whether or not to wipe to
old signature. Adding a --yes argument to the lvcreate command
automatically answers yes to the wipe question and allows lvcreate
to wipe the old signature.
Change-Id: I0d69bd920c8e62915853ecc3b22825fa98f7edf3
There exists a case for bluestore OSDs where the OSD init process
detects that an OSD has already been initialized in the deployed
Ceph cluster, but the cluster osdmap does not have an entry for it.
This change corrects this case to zap and reinitialize the disk
when OSD_FORCE_REPAIR is set to 1. It also clarifies a log message
in this case when OSD_FORCE_REPAIR is 0 to state that a manual
repair is necessary.
Change-Id: I2f00fa655bf5359dcc80c36d6c2ce33e3ce33166
This change converts the readiness and liveness probes in the Ceph
charts to use the functions from the Helm toolkit rather than
having hard-coded probe definitions. This allows probe configs to
be overridden in values.yaml without rebuilding charts.
Change-Id: I68a01b518f12d33fe4f87f86494a5f4e19be982e
In some cases, especially for disruptive OSD restarts on upgrade,
PGs can take longer than the allowed ~30 seconds to get into a
peering state. In these cases, the post-apply job fails prematurely
instead of allowing time for the OSDs and PGs to recover. This
change extends that timeout to ~10 minutes instead to allow the PGs
plenty of recovery time.
The only negative effect of this change is that a legitimate
failure where the PGs can't recover will take 10 minutes to fail
the post-apply job instead of 30 seconds.
Change-Id: I9c22bb692385dbb7bc2816233c83c7472e071dd4
This change updates all Ceph image references to use Focal images
for all charts in openstack-helm-infra.
Change-Id: I759d3bdcf1ff332413e14e367d702c3b4ec0de44
Based on spec in openstack-helm repo,
support-OCI-image-registry-with-authentication-turned-on.rst
Each Helm chart can configure an OCI image registry and
credentials to use. A Kubernetes secret is then created with these
info. Service Accounts then specify an imagePullSecret specifying
the Secret with creds for the registry. Then any pod using one
of these ServiceAccounts may pull images from an authenticated
container registry.
Change-Id: Iebda4c7a861aa13db921328776b20c14ba346269
It is possible for misbehaving ceph-mon pods to cause the ceph-osd
liveness probe to fail for healthy ceph-osd pods, which can cause
healthy pods to get restarted unnecessarily. This change removes
the ceph-mon query from the ceph-osd liveness probe so the probe
is only dependent on ceph-osd state.
Change-Id: I9e1846cfdc5783dbb261583e04ea19df81d143f4
There are bugs with containerizing certain udev operations in some
udev versions. The osd-init container can hang in these
circumstances, so the osd-init scripts are modified not to use
these problematic operations.
Change-Id: I6b39321b849f5fbf1b6f2097c6c57ffaebe68121
This change allows OSDs to be restarted unconditionally by the
ceph-osd chart. This can be useful in upgrade scenarios where
ceph-osd pods are unhealthy during the upgrade.
Change-Id: I6de98db2b4eb1d76411e1dbffa65c263de3aecee
The new, disruptive post-apply logic to restart ceph-osd pods more
efficiently on upgrade still waits for pods to be in a non-
disruptive state before restarting them disruptively. This change
skips that wait if a disruptive restart is in progress.
Change-Id: I484a3b899c61066aab6be43c4077fff2db6f54bc
Currently the ceph-osd post-apply job always restarts OSDs without
disruption. This requires waiting for a healthy cluster state in
betweeen failure domain restarts, which isn't possible in some
upgrade scenarios. In those scenarios where disruption is
acceptable and a simultaneous restart of all OSDs is required,
the disruptive_osd_restart value now provides this option.
Change-Id: I64bfc30382e86c22b0f577d85fceef0d5c106d94
Since we are about to use wildcards in storage locations,
it is possible to have multiple matches, so we need to add
precheck before using $STORAGE_LOCATION, $BLOCK_DB and $BLOCK_WAL
variables to ensure that stored strings resolve to just one and
only block location.
Signed-off-by: Ruslan Aliev <raliev@mirantis.com>
Change-Id: I60180f988e90473e200e886b69788cc263359ad2
This is a code improvement to reuse ceph monitor doscovering function
in different templates. Calling the mentioned above function from
a single place (helm-infra snippets) allows less code maintenance
and simlifies further development.
Rev. 0.1 Charts version bump for ceph-client, ceph-mon, ceph-osd,
ceph-provisioners and helm-toolkit
Rev. 0.2 Mon endpoint discovery functionality added for
the rados gateway. ClusterRole and ClusterRoleBinding added.
Rev. 0.3 checkdns is allowed to correct ceph.conf for RGW deployment.
Rev. 0.4 Added RoleBinding to the deployment-rgw.
Rev. 0.5 Remove _namespace-client-ceph-config-manager.sh.tpl and
the appropriate job, because of duplicated functionality.
Related configuration has been removed.
Rev. 0.6 RoleBinding logic has been changed to meet rules:
checkdns namespace - HAS ACCESS -> RGW namespace(s)
Change-Id: Ie0af212bdcbbc3aa53335689deed9b226e5d4d89
The wait for misplaced objects during the ceph-osd post-apply job
was added to prevent I/O disruption in the case where misplaced
objects cause multiple replicas in common failure domains. This
concern is only valid before OSD restarts begin because OSD
failures during the restart process won't cause replicas that
violate replication rules to appear elsewhere.
This change keeps the wait for misplaced objects prior to beginning
OSD restarts and removes it during those restarts. The wait during
OSD restarts now only waits for degraded objects to be recovered
before proceeding to the next failure domain.
Change-Id: Ic82c67b43089c7a2b45995d1fd9c285d5c0e7cbc
The log-runner previously was not included in the mandatory access
control (MAC) annotation for the OSD pods, which means it could not
have any AppArmor profile applied to it. This patchset adds that
capability for that container.
Change-Id: I11036789de45c0f8f66b51e15f2cc253e6cb230c
This change updates the helm-toolkit path in each chart as part
of the move to helm v3. This is due to a lack of helm serve.
Change-Id: I011e282616bf0b5a5c72c1db185c70d8c721695e
If labels are not specified on a Job, kubernetes defaults them
to include the labels of their underlying Pod template. Helm 3
injects metadata into all resources [0] including a
`app.kubernetes.io/managed-by: Helm` label. Thus when kubernetes
sees a Job's labels they are no longer empty and thus do not get
defaulted to the underlying Pod template's labels. This is a
problem since Job labels are depended on by
- Armada pre-upgrade delete hooks
- Armada wait logic configurations
- kubernetes-entrypoint dependencies
Thus for each Job template this adds labels matching the
underlying Pod template to retain the same labels that were
present with Helm 2.
[0]: https://github.com/helm/helm/pull/7649
Change-Id: I3b6b25fcc6a1af4d56f3e2b335615074e2f04b6d
This PS changes the log-runner user ID to run as the ceph user
so that it has the appropriate permissions to write to /var/log/ceph
files.
Change-Id: I4dfd956130eb3a19ca49a21145b67faf88750d6f
The checkDNS script which is run inside the ceph-mon pods has had
a bug for a while now. If a value of "up" is passed in, it adds
brackets around it, but then doesn't check for the brackets when
checking for a value of "up". This causes a value of "{up}" to be
written into the ceph.conf for the mon_host line and that causes
the mon_host to not be able to respond to ceph/rbd commands. Its
normally not a problem if DNS is working, but if DNS stops working
this can happen.
This patch changes the comparison to look for "{up}" instead of
"up" in three different files, which should fix the problem.
Change-Id: I89cf07b28ad8e0e529646977a0a36dd2df48966d
- As it will be a security violation to mount anything under /var
partition to pods , changing the mount propagation to HostToContainer
Change-Id: If7a27304507a9d1bcb9efcef4fc1146f77080a4f
Wherever possible, the ceph-osd containers need to run with the
least amount of privilege required. In some cases there are
privileges granted but are not needed. This patchset modifies
those container's security contexts to reduce them to only what
is needed.
Change-Id: I0d6633efae7452fee4ce98d3e7088a55123f0a78
This change adds /var/crash as a host-path volume mount for
ceph-osd pods in order to facilitate core dump capture when
ceph-osd daemons crash.
Change-Id: Ie517c64e08b11504f71d7d570394fbdb2ac8e54e
This change configures Ceph daemon pods so that
/var/lib/ceph/crash maps to a hostPath location that persists
when the pod restarts. This will allow for post-mortem examination
of crash dumps to attempt to understand why daemons have crashed.
Change-Id: I53277848f79a405b0809e0e3f19d90bbb80f3df8
Some minor improvements are made in this patchset:
1) Move osd_disk_prechecks to the very beginning to make sure the
required variables are set before running the bulk of the script.
2) Specify variables in a more consistent manner for readability.
3) Remove variables from CLI commands that are not used/set.
Change-Id: I6167b277e111ed59ccf4415e7f7d178fe4338cbd
This will ease mirroring capabilities for the docker official images.
Signed-off-by: Thiago Brito <thiago.brito@windriver.com>
Change-Id: I0f9177b0b83e4fad599ae0c3f3820202bf1d450d
1) Removed some remaining unsupported ceph-disk related code.
2) Refactored the code that determines when a disk should be
zapped. Now there will be only one place where disk_zap is
called.
3) Refactored the code that determines when LVM prepare should
be called.
4) Improved the logging within the OSD init files
Change-Id: I194c82985f1f71b30d172f9e41438fa814500601
This is the first of multiple updates to ceph-osd where the OSD
init code will be refactored for better sustainability.
This patchset makes 2 changes:
1) Removes "ceph-disk" support, as ceph-disk was removed from the
ceph image since nautilus.
2) Separates the initialization code for the bluestore, filestore,
and directory backend configuration options.
Change-Id: I116ce9cc8d3bac870adba8b84677ec652bbb0dd4
Directory-based OSDs are failing to deploy because 'python' has
been replaced with 'python3' in the image. This change updates the
python commands to use python3 instead.
There is also a dependency on forego, which has been removed from
the image. This change also modifies the deployment so that it
doesn't depend on forego.
Ownership of the OSD keyring file has also been changed so that it
is owned by the 'ceph' user, and the ceph-osd process now uses
--setuser and --setgroup to run as the same user.
Change-Id: If825df283bca0b9f54406084ac4b8f958a69eab7
When using a helm3 to deploy, it fails as helm 3
no longer supports rbac.authorization.k8s.io/v1beta1,
but v1 can support helm2 and helm3 (liujinyuan@inspur.com).
Change-Id: I40a5863c80489db8ea40028ffb6d89c43f6771d6
The volume naming convention prefixes logical volume names with
ceph-lv-, ceph-db-, or ceph-wal-. The code that was added recently
to remove orphaned DB and WAL volumes does a string replacement of
"db" or "wal" with "lv" when searching for corresponding data
volumes. This causes DB volumes to get identified incorrectly as
orphans and removed when "db" appears in the PV UUID portion of
the volume name.
Change-Id: I0c9477483b70c9ec844b37a6de10a50c0f2e1df8
Found another issue in disk_zap() where a needed update was missed when
https://review.opendev.org/c/openstack/openstack-helm-infra/+/745166
changed the logical volume naming convention.
The above patch set renamed volumes that followed the old convention,
so this logic will never be correct and must be updated.
Also added logic to clean up orphaned DB/WAL volumes if they are
encountered and removed some cases where a data disk is marked as in use
when it isn't set up correctly.
Change-Id: I8deeecfdb69df1f855f287caab8385ee3d6869e0
OSD logical volume names used to be based on the logical disk path,
i.e. /dev/sdb, but that has changed. The lvremove logic in disk_zap()
is still using the old naming convention. This change fixes that.
Change-Id: If32ab354670166a3c844991de1744de63a508303
There are many race conditions possible when multiple ceph-osd
pods are initialized on the same host at the same time using
shared metadata disks. The locked() function was introduced a
while back to address these, but some commands weren't locked,
locked() was being called all over the place, and there was a file
descriptor leak in locked(). This change cleans that up by
by maintaining a single, global file descriptor for the lock file
that is only opened and closed once, and also by aliasing all of
the commands that need to use locked() and removing explicit calls
to locked() everywhere.
The global_locked() function has also been removed as it isn't
needed when individual commands that interact with disks use
locked() properly.
Change-Id: I0018cf0b3a25bced44c57c40e33043579c42de7a
The default, directory-based OSD configuration doesn't appear to work
correctly and isn't really being used by anyone. It has been commented
out and the comments have been enhanced to document the OSD config
better. With this change there is no default configuration anymore, so
the user must configure OSDs properly in their environment in
values.yaml in order to deploy OSDs using this chart.
Change-Id: I8caecf847ffc1fefe9cb1817d1d2b6d58b297f72
OSD failures during an update can cause degraded and misplaced
objects. The post-apply job restarts OSDs in failure domain
batches in order to accomplish the restarts efficiently. There is
already a wait for degraded objects to ensure that OSDs are not
restarted on degraded PGs, but misplaced objects could mean that
multiple object replicas exist in the same failure domain, so the
job should wait for those to recover as well before restarting
OSDs in order to avoid potential disruption under these failure
conditions.
Change-Id: I39606e388a9a1d3a4e9c547de56aac4fc5606ea2
A recent change to wait_for_pods() to allow for fault tolerance
appears to be causing wait_for_pgs() to fail and exit the post-
apply script prematurely in some cases. The existing
wait_for_degraded_objects() logic won't pass until pods and PGs
have recovered while the noout flag is set, so the pod and PG
waits can simply be removed.
Change-Id: I5fd7f422d710c18dee237c0ae97ae1a770606605
The PS updates post apply job and allows to check multiple times
inactive PGs that are not peering. The wait_for_pgs() function
fails after 10 sequential positive checks.
Change-Id: I98359894477c8e3556450b60b25d62773666b034
The PS updates wait_for_pods() function in post apply script.
The changes allow to pass wait_for_pods() function when required percent
of OSDs reached (REQUIRED_PERCENT_OF_OSDS). Also removed a part of code
which is not needed any more.
Change-Id: I56f1292682cf2aa933c913df162d6f615cf1a133
There are race conditions in the ceph-volume osd-init script that
occasionally cause deployment and OSD restart issues. This change
attempts to resolve those and stabilize the script when multiple
instances run simultaneously on the same host.
Change-Id: I79407059fa20fb51c6840717a083a8dc616ba410
This is to improve the logic to detect used osd disks so that scripts will
not zap the osd disks agressively.
also adding debugging mode for pvdisplay commands to capture more logs
during failure scenarios along with reading osd force repair flag from
values.
Change-Id: Id2996211dd92ac963ad531f8671a7cc8f7b7d2d5
This is to fix the sync between ceph osds when they are using shared
disk for metadata as they are having conflict while preparing the metadata disk.
we are adding a lock when first osd preparing the sahred metadata disk so that
other osd will wait for the lock, also adding udev settle in few places to get
latest tags on lvm devices.
Change-Id: I018bd12a3f02cf8cd3486b9c97e14b138b5dac76
This addresses an issue that can prevent some OSDs from being able
to restart properly after they have been deployed. Some OSDs try to
prepare their disks again on restart and end up crash looping. This
change fixes that.
Change-Id: I9edc1326c3544d9f3e8b6e3ff83529930a28dfc6
The existing search for logical volumes to determine if an OSD data
is already being used is incomplete and can yield false positives in
some cases. This change makes the search more correct and specific in
order to avoid those.
Change-Id: Ic2d06f7539567f0948efef563c1942b71e0293ff