You need to be familiar with these sections:
LAVA is complex and administering a LAVA instance can be an open-ended task covering a wide range of skills.
These rules may seem harsh or obvious or tedious. However, multiple people have skipped one or more of these requirements and have learnt that these steps provide valuable advice and assistance that can dramatically improve your experience of LAVA. Everyone setting up LAVA, is strongly advised to follow all of these rules.
There are a number of common fallacies relating to automation. Check your test ideas against these before starting to make your plans:
connect & test seems simple enough - it doesn’t seem as if you need to deploy a new kernel or rootfs every time, no need to power off or reboot between tests. Just connect and run stuff. After all, you already have a way to manually deploy stuff to the board.
test everything at the same time - you’ve built an entire system and now you put the entire thing onto the device and do all the tests at the same time. There are numerous problems with this approach:
I already have builds - this may be true, however, automation puts extra demands on what those builds are capable of supporting. When testing manually, there are any number of times when a human will decide that something needs to be entered, tweaked, modified, removed or ignored which the automated system needs to be able to understand. Examples include:
Make use of the standard files for known working device types. These files come with details of how to rebuild the files, logs of the each build and checksums to be sure the download is correct.
Automation can do everything - it is not possible to automate every test method. Some kinds of tests and some kinds of devices lack critical elements that block automation. These are not problems in LAVA, these are design limitations of the kind of test and the device itself. Your preferred test plan may be infeasible to automate and some level of compromise will be required.
Users are all admins too - this will come back to bite! However, there are other ways in which this can occur even after administrators have restricted users to limited access. Test jobs (including hacking sessions) have full access to the device as root. Users, therefore, can modify the device during a test job and it depends on the device hardware support and device configuration as to what may happen next. Some devices store bootloader configuration in files which are accessible from userspace after boot. Some devices lack a management interface that can intervene when a device fails to boot. Put these two together and admins can face a situation where a test job has corrupted, overridden or modified the bootloader configuration such that the device no longer boots without intervention. Some operating systems require a debug setting to be enabled before the device will be visible to the automation (e.g. the Android Debug Bridge). It is trivial for a user to mistakenly deploy a default or production system which does not have this modification.
Administrators need to be mindful of the situations from which users can (mistakenly or otherwise) modify the device configuration such that the device is unable to booting without intervention when the next job starts. This is one of the key reasons for health checks to run sufficiently often that the impact on other users is minimised.
The ongoing roles of administrators include:
When you come across problems with your LAVA instance, there are some basic information sources, methods and tools which will help you identify the problem(s).
LAVA uses Jinja2 to allow devices to be configured using common data blocks, inheritance and the device-specific device dictionary. Templates are installed into:
/etc/lava-server/dispatcher-config/device-types/
Note
Although these are configuration files and package updates will respect any changes you make, please talk to us about changes to existing templates maintained within the lava-server package.
# FIXME: add link to developer notes on modifying/creating templates.
lava-master - controls all V2 test jobs after devices have been assigned. Logs are created on the master:
/var/log/lava-server/lava-master.log
lava-scheduler - controls how all devices are assigned. Control will be handed over to lava-master once V1 code is removed. Logs are created on the master:
/var/log/lava-server/lava-scheduler.log
lava-slave - controls the operation of the test job on the slave. Includes details of the test results recorded and job exit codes. Logs are created on the slave:
/var/log/lava-dispatcher/lava-slave.log
slave logs are normally transmitted to the master but will also appear in /tmp/lava-dispatcher/slave/ in directories named from the job ID. Logs include:
job validation - the master retains a copy of the output from the validation of the testjob. Currently, this validation occurs on the master but may move to the slave in future. The logs is stored on the master as the lavaserver user - so for job ID 4321:
$ sudo su lavaserver
$ ls /var/lib/lava-server/default/media/job-output/job-4321/description.yaml
Note
If you are considering using MultiNode in your Test Plan, now is the time to ensure that MultiNode jobs can run successfully on your instance.
Once you have a couple of QEMU devices running and you are happy with how to maintain, debug and test using those devices, start adding known working devices. These are devices which already have templates in:
/etc/lava-server/dispatcher-config/device-types/
The majority of the known device types are low-cost ARM developer boards which are readily available. Even if you are not going to use these boards for your main testing, you are recommended to obtain a couple of these devices as these will make it substantially easier to learn how to administer LAVA for any devices other than emulators.
Physical hardware like these dev-boards have hardware requirements like:
Understanding how all of those bits fit together to make a functioning LAVA instance is much easier when you use devices which are known to work in LAVA.
Early admin stuff: