The idea is to stop changing mcollective config and restarting it from
nailgun agent. This causes a lot of problems related to mcollective
config restarted at wrong time. From now at the bootstrap stage
mcollective is expected to be configured and started by startup scripts
while in bootstrap and by cloud-init at first boot into provisioned
node.
Change-Id: Ic8e31d6381d8ffb8f7fdfd1aa8ebc655bb4535ec
Partial-Bug: #1585671
Depends-On: Ia2f984570b38642b1090f6483ed3fa78958550c5
When nailgun-agent starts before device mapper
assembles multipath devices it reports physical disks
that are used for multipath devices as usual disks.
This patch does the following
* filter out devices that are DM_MULTIPATH_DEVICE_PATH = 1
* run 'udevadm settle' before scanning multipath devices
* delays scanning multipath devices until building block device info
Change-Id: I088aede0cf3bd1d16a57e7cdec4e50cab2c19175
Closes-Bug: #1652788
(cherry picked from commit 8fbc4d6405)
The 'size' parameter for CPU returned by lshw is current CPU clock
which changes constantly breaking nailgun optimization which expects
hw data to not change each time.
This also must reduce probability of deadlocks in #1624230.
Change-Id: I7d9b5282991b17424d458c2612dfa4eeeb52be48
Partial-Bug: #1624230
For some devices like Virtio network device [1af4:1000], there is
invalid PCIID in /sys/class/net.
Change-Id: I3bc514c2d57e3a7669c418e49830491041cb8f52
Closes-Bug: #1655733
Sometimes udev places devices symlinks into a directory
/dev/disk/by-id/foo/device_symlink and we must be able
to go through this directories recursively and find all
symlinks.
Change-Id: I0749ab94e05fdf6fd12dc89c2c788a61a128b77d
Closes-Bug: #1642391
Objects from Offloading class are used to calculate checksums for
Nailgun, without `to_s` object id is used, which is every time different.
Change-Id: I07e9c3d0a88e7a674b0eeab8dc23f20987fedea6
Closes-bug: #1643008
In the _get_pci_dev_list method adding .chomp is needed to the
`cat /etc/nailgun_systemtype` operation in order to make a correct
comparison with 'bootstrap' string.
Change-Id: Id2fdc4c7b7bd7604c43803da594480bf865cf1cb
Related-Bug: #1554970
Add hostname check and run lshw only on bootstrap nodes.
Add sanitize param to lshw to hide any ip,mac etc
Change-Id: I7739da68ab059178787ff0fe2418a54717684750
Closes-Bug: #1554970
The logic of filtering of fake RAID MD heavily relies on the
presence of of 'Container' field inside of parsed data.
If this field is missing, it will never figure out the name of any
fake RAID devices and its component.
This patch adds this field to a parser.
In addition to that, it also logs all found fake RAIDs and
their components for the sake of easy debugging.
Change-Id: I2066c5a0e995e542271cd308c9d83e2373787be4
Closes-Bug: #1617071
Ohai required support additional packages,
and unfortunatelly, not all of them are
opensource friendly (ruby-sigar, for example).
This changes will let us to rid ruby-mixlib*,
ruby-sigar and ruby-yajl packages.
Also, it may sound strange, but ohai[:virtualization]
makes decision based on /proc/cpuinfo information
only (this applies only to kvm/qemu, other virt-systems
determines correctly, AFAICS).
So, if someone will choose a non-default (qemu)
processor configuration, ohai will return incomplete
information about virtualization on a kvm-based virtual host.
Facter doing it more intelligently.
Blueprint: get-rid-of-ohai
Change-Id: Ia8021a3ab83bbf973eff548880ae10a540476b1c
ohai reports interface name as br-bm.755@br-bm if device is vlan subinterface
on the bridge.
This patch skip interfaces with such names.
Change-Id: I17fe2276ca5e6cddd38f70f44f1275eb97814a26
Closes-Bug: #1592361
Rethtool executes two ioctls SIOCETHTOOL, one with
command ETHTOOL_GLINKSETTINGS and then with ETHTOOL_GDRVINFO,
but some drivers (like virtio) don't implement
ETHTOOL_GLINKSETTINGS, so it fails.
I've send patch to the author of Rethtool, but he haven't
replied yet, and it also will take some time to have this
fix in out Rethtool package. So let's call ioctl
seaprately if Rethtool::InterfaceSettings failed.
Change-Id: Iea95e1b132a33621f7538c4b0ba43b134e2560ee
W/o this fix node do not respond with proper PUT for pxe interface
as the pxe interface gets plugged into br-fw-admin bridge and
it is not reported as pxe-related one.
We should check if the interface is a part of bond and/or bridge
configuration and respond correctly.
Change-Id: Ifc1c396b0945fc5a42165b969b6924dcff5975b2
Closes-bug: #1581517
It's a bit hard to debug nailgun-agent, because it
tries to send info over http somewhere and sleeps at
the beginning. So you can't run it on your desktop, for
example.
Let's add --dry-run option, to have ability to just get
and print all information.
This is a small improvement, so no bug or blueprint.
Change-Id: If7309635d40ff3263a671fddd7df20efd917097c
nailgun agent must not change mcollective config after provisioning
cloudinit is responsible for that.
It may happen that nailgun agent may be started by cron earlier than
it is reconfigured and started by cloudinit. This leads to the
situation when cloudinit reconfigures mcollective and issues start
which doesn't do anything since it is already started by nailgun agent.
Finally, mcollective is left started with incorrect default
configuration.
Change-Id: I0c6f3720943ad21e22899368832e451bc906b098
Closes-Bug: #1455489
LVM aligns devices data area to io hints, reported for
block devices. Optimal io size can be big for some devices
(16M, as reported in bug). So nailgun's volume manager
should take it into account while doing partitioning.
So nailgun-agent should send this information to nailgun.
Change-Id: Idd9b778f7f21d4fe9d3fd029038664de41f93447
Partial-Bug: #1546049
Nailgun agent will fail to recognize FusionIO flash storage devices as valid block storage. Added "251", the device major number, to the whitelist of valid block storage devices.
Change-Id: I374920f00141ffd3e4263e7c6ec340216b2c39d9
Signed-off-by: Will Kline <will@wolfdenassociates.com>
It's possible that SR-IOV is listed in capatibilities but
Total VFs is equal to zero. In such case nailgun-agent should
report that SR-IOV is not available.
Change-Id: I564b5135831f63e591296b1c5dfbe933687bc7d8
Closes-bug: #1564630
In some cases lshw may take too long like when we have
lot of partitions > 600. We avoid this problem with
setting timeout for lshw.
Change-Id: I67748bc18023f3f6edce0cc20d4f0486877723b2
Closes-bug: #1559167
All USB storage devices must be filtered by the default as often this
type of devices can be just an emulated temprorary storage for FW
upgrade and so on.
If one wants to get usb block devices reported to nailgun, then
it could be either a cmdline option report_usb_block_devices or
the same option added to the agent' config file.
DocImpact
Change-Id: Id609715732fd0ab393d1557b4810464fbfaf096e
Closes-Bug: #1543221
This is all in one solution for parsing and interpretation
of multipath devices
Blueprint: multipath-disks-support
Change-Id: I48095d0fa6ba52545a5bd5c72026100912c7c436
In case of one NUMA node `lstopo` doesn't return
information about distances, so by default distances
should be 2d array [["1.0"]]
Change-Id: I858c93e7f41b1a670bc72a80cd0c5b47fe63ef12
Partial-Bug: #1551955
Get numa_node ID for each NIC, add this info to interface meta and
report it to Nailgun.
Depends-on: I62299123b7ba783544a0b7411d5ee95bcab726f3
Change-Id: Ib75afde70da938fee95822c4575068c74efb8c93
Implements: blueprint support-dpdk
Get PCI-ID infromation for NIC via sysfs and report it to nailgun.
Implements: blueprint support-dpdk
Change-Id: I7a6187be1e35e428f7d868584d2c1d4a8686b0bd
- Skip Virtual Functions from list of interfaces
- Add SR-IOV related info for enabled devices to meta/interfaces
Change-Id: I62299123b7ba783544a0b7411d5ee95bcab726f3
Implements: blueprint #support-sriov
Implements: blueprint #sr-iov-in-nailgun-agent
For some reason we make a small error
while throw exception and print log.
Patch fixes it.
Closes-Bug:1550335
Change-Id: I6f3bd7d9554f9c296c033c89e62c3aaa739295b8
On Linux ohai gets the block device size from /sys/block/$device/size.
That size is always measured in units of 512 bytes even if the "physical"
block size of the device in question is different [1][2]. On the other hand
/sys/block/$device/queue/logical_block_size is the smallest unit which
the device can address, and /sys/block/$device/queue/physical_block_size
is the smallest unit the physical storage device can write atomically.
Typically SATA/SAS drives having the size >= 2 TB have 4 KB physical
sectors and expose 512 "logical" block size to the operating system.
However some hard drives (for instance, HGST Ultrastar 7K6000 SAS drive)
expose the actual physical sector size (that is, 4 KB) for the efficiency
reasons. As a result nailgun-agent miscomputes the size of the hard drive
(it's 8x off!)
[1] http://lxr.free-electrons.com/source/include/linux/types.h?v=4.4#L124
[2] http://lxr.free-electrons.com/source/drivers/scsi/sd.c?v=4.4#L2340
Change-Id: Iae36b11dce8e6f43d7ee4bddac5098c633883ed6
Closes-Bug: #1544816
Apparently, nailgun-agent filters out NVME devices due to
major number mismatch.
This wasn't an issue with kernel shipped with CentOS 6.x
where major number was 253.
For new kernel version valid major number is 259.
Change-Id: Idc572832b3a5650496439c32c0addcca4d759378
Closes-Bug: #1536055
In order to allow nailgun-agent find out fake raid devices:
* MD with 'Raid Level' set to 'container' need to be found,
* undelaying devices from that MD should be filtered out from
storage devices.
So, if /dev/md127 exists and is actually a contaner, then
* nailgun agent should skip it from reporting as container
device can't be used as a block device,
* underlaying devices from that container (say, /dev/sda and /dev/sdb)
shouldn't be reported as storage devices. Logic is simple,
those devices can't be used too, as they're a part of MD.
* only actual MD (say, /dev/md126) which represent fake raid
will be reported as a storage device.
Change-Id: I48d3e52cb0f051e6e20fd57e3d9f15e8db1c99aa
Co-Authored-By: Serhii Lystopad <slystopad@mirantis.com>
Related-Bug: #1508908