Age | Commit message (Collapse) | Author |
|
Zombie tasks are dumped in dump_zombies() so it is redundant to handle them
in dump_one_task().
Deprecate cg_set in task_core_entry as this field must be per thread now.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
|
|
Some users on Raspberry Pi report that the kerndat checking for
memfd_create(MFD_HUGETLB) support returns ENOSYS even when memfd_create
syscall is available. We currently treat this error as unexpected and
return error. This commit marks the memfd_create(MFD_HUGETLB) as
unavailable when ENOSYS is returned.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
|
|
A previous commit added a cgroup cpuset unmounting to
scripts/ci/Makefile. We are sometimes running in a container without the
necessary privileges to unmount certain cgroups.
This commit moves the cgroup unmounting to a place in run-ci-tests.sh
which already requires privileged access and does not break unprivileged
build-only CI runs.
Signed-off-by: Adrian Reber <areber@redhat.com>
|
|
Signed-off-by: Adrian Reber <areber@redhat.com>
|
|
As cgroupv2_00, cgroupv2_01 need cpuset in cgroup-v2 hierarchy to check CRIU
handle cgroup-v2 properly, umount cpuset in cgroup-v1 to make it move to
cgroup-v2.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
|
|
This test creates a process with 2 threads in different threaded controllers and
check if CRIU restores these threads' cgroup controllers properly.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
|
|
As threads in a process may be in different threaded controllers, we need to
move thoses threads to the correct controllers.
Because the threads of a process are restored in later stage in restorer.c, we
need to create a cgroupd service to help to move those threads into correct
controllers when they are restored. We cannot use usernsd as the code in
restorer does not know the address of outside function to pass to userns_call.
However, this cgroupd service still reuses a lot of code from usernsd.
The main logic is that restored threads receive the cg_set number they belong to
before restorer stage in case their cg_set are different from main thread. When
these threads are restored, they send the cg_set number and their thread ids
through unix socket to cgroupd. cgroupd receives the cg_set number and thread
ids and moves those threads into correct controllers. Thread ids are sent
through SCM_CREDENTIALS of unix socket so they are translated into correct
thread ids in the receiving end.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
|
|
Currently, we assume all threads in process are in the same cgroup controllers.
However, with threaded controllers, threads in a process may be in different
controllers. So we need to dump cgroup controllers of every threads in process
and fixup the procfs cgroup parsing to parse from self/task/<tid>/cgroup.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
|
|
Check that CRIU can checkpoint/restore global properties in cgroup-v2 properly.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
|
|
Add write_value/read_value helpers to write/read buffer to/from files into zdmt
library.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
|
|
This commit supports checkpoint/restore some new global properties in cgroup-v2
cgroup.subtree_control
cgroup.max.descendants
cgroup.max.depth
cgroup.freeze
cgroup.type
Only cgroup.subtree_control, cgroup.type need some more code to handle.
cgroup.subtree_control value needs to be set with "+", "-" prefix and
cgroup.type can only be written with value "threaded" if we want to make this
controller threaded. cgroup.type is a special property because this property
must be restored before any processes can move into this controller.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
|
|
It seems like drone.io no longer provides free aarch64/armhf CI runs.
This switches the aarch64 CI runs to Cirrus CI. armhf CI runs have been
dropped for now as they are not directly supported.
Signed-off-by: Adrian Reber <areber@redhat.com>
|
|
Since commit https://github.com/torvalds/linux/commit/5563cabdde, user with
enough capability can open IPC sysctl files and write to them. Therefore, we
don't need to use usernsd process in the outside user namespace to help with
that anymore. Furthermore, some later commits:
https://github.com/torvalds/linux/commit/1f5c135ee5,
https://github.com/torvalds/linux/commit/0889f44e28 bind the IPC namespace to
the opened file descriptor of IPC sysctl at the open() time, the changed value
does not depend on the IPC namespace of write() time anymore. This breaks the
current usernsd approach.
So, we prioritize opening/writing IPC sysctl files in the context of restored
process directly without usernsd help. This approach succeeds in the newer
kernel since the restored process has enough capabilities at this restore stage.
With older kernel, the open() fails and we fallback to the usernsd approach.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
|
|
In Virtuozzo we've faced out-of-bound access when calling this function
on short path string, which corrupted other memory and lead to
segmentation fault. So it may be useful to have this comment in code to
avoid such a missuse of this function in future.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
|
|
Run env00 and pthread00 test as non-root as initial proof of concept.
Signed-off-by: Adrian Reber <areber@redhat.com>
|
|
These are the minimal changes to make zdtm.py successfully run the
env00 and pthread test case as non-root using the '--rootless' zdtm option.
Co-authored-by: Younes Manton <ymanton@ca.ibm.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
|
|
This adds the non-root section and information about the parameter
--unprivileged to the man page.
Co-authored-by: Anna Singleton <annabeths111@gmail.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Anna Singleton <annabeths111@gmail.com>
|
|
This patch modifies how kerndat is handled in unprivileged mode.
Initialization and functionality that can only be done as root is
made separate from common code. The kerndat file's location is
defined as $XDG_RUNTIME_DIR/criu.kdat in unprivileged mode. Since
we expect that directory to be on tmpfs we maintain the same behavior
as the root-mode kerndat which lives in /run.
Co-authored-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
|
|
This commit enables checkpointing and restoring of applications as
non-root.
First goal was to enable checkpoint and restore of the env00 and
pthread00 test case.
This uses the information from opts.unprivileged and opts.cap_eff to
skip certain code paths which do not work as non-root.
Co-authored-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
|
|
This adds the function check_caps() which checks if CRIU is running
with at least CAP_CHECKPOINT_RESTORE. That is the minimum capability
CRIU needs to do a minimal checkpoint and restore from it.
In addition helper functions are added to easily query for other
capability for enhanced checkpoint/restore support.
Co-authored-by: Younes Manton <ymanton@ca.ibm.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
|
|
The idea behind the rootless CRIU code is, that CRIU reads out its
effective capabilities and stores that in the global opts structure.
Different parts of CRIU can then, based on the existing capabilities,
automatically enable or disable certain code paths.
Currently at least CAP_CHECKPOINT_RESTORE is required. CRIU will not
start without this capability.
Signed-off-by: Adrian Reber <areber@redhat.com>
|
|
python2-future, python2-junit_xml, python-flake8 and libbsd-devel are
now provided from EPEL.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
|
|
The ppc64le ABI allows functions to store data in caller frames.
When initializing the stack pointer prior to executing parasite code
we need to pre-allocating the minimum sized stack frame before
jumping to the parasite code.
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
|
|
Some ABIs allow functions to store data in caller frame, which
means that we have to allocate an initial stack frame before
executing code on the parasite stack.
This test saves the contents of writable memory that follows the stack
after the victim has been infected but before we start using the
parasite stack. It later checks that the saved data matches the
current contents of the two memory areas. This is done while the
victim is halted so we expect a match unless executing parasite code
caused memory corruption. The test doesn't detect cases where we
corrupted memory by writing the same value.
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
|
|
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
return zero on chk success
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Co-authored-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
|
|
Starting the daemon is the first time we run code in the victim
using the parasite stack.
It's useful for testing to be able to infect the victim without starting
the daemon so that we can inspect the victim's state, set up stack
guards, and so on before stack-related corruption can happen.
Add compel_infect_no_daemon() to infect the victim but not start the
daemon and compel_start_daemon() to start the daemon after the victim
is infected.
Add compel_get_stack() to get the victim's main and thread parasite
stacks.
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
|
|
Signed-off-by: Liu Hua <weldonliu@tencent.com>
|
|
In fact an array (aptly named array) is already used in run_test2,
so let's just make it an array right from the start.
While at it, remove ls invocation.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
|
|
This basically replaces
for x in $(sed ...); do
with
sed ... | while IFS= read -r x; do
The only caveat is, sed program was amended to remove empty lines
(there was one right above the PB_AUTOGEN_STOP).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
|
|
Those are no longer needed with shellcheck 0.8.0.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
|
|
This is a preferred way of fixing SC2086 shellcheck warning.
Note that since ZDTM_OPTS is passed as a string (via make or docker),
we are converting it to an array using read -a.
Remove all "shellcheck disable=SC2086" annotations.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
|
|
This is easy to fix (but we have to specify -x).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
|
|
We can use globstar bash feature instead of find in this case.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
|
|
It is ok to quote $@, as it expands to "$1" "$2" ...
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
|
|
Instead of using shellcheck v0.7.2 from fedora repo,
let's install the latest version (v0.8.0).
This allows to remove some "shellcheck disable=..." annotations,
and (I hope) better checking quality overall.
While at it, remove findutils from dnf install as this package is
already installed.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
|
|
When we restore a shell-job we would inherit tty-s, so even if we don't
have a right mount for it in container on dump, on restore it should
just be right.
Else when dumping second time via criu-ns we get:
(00.005678) Error (criu/files-reg.c:1710): Can't lookup mount=29 for fd=0 path=/dev/pts/20
Fixes: #1893
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
|
|
When we are restoring in new pidns we specifically do setsid() from
criu-ns init so that sids of restored tasks are non-zero in this pidns
and on next dump CRIU would not have problems with zero sids, see [1].
But after this CRIU tries to inherit and setup a tty for the restored
process, and it fails to set it's process group via TIOCSPGRP to be a
foreground group for it's tty, because tty already is a controlling tty
for other session (which we had before setsid).
So to make it restore we need to reset tty to be a controlling tty of
criu-ns init via TIOCSCTTY before calling criu.
Else when restoring first time via criu-ns (from criu-ns dump) we get:
Error (criu/tty.c:689): tty: Failed to set group 40816 on 0: Inappropriate ioctl for device
https://github.com/checkpoint-restore/criu/issues/232 [1]
v2: add why and what comment in code, set controlling tty only for
--shell-job and fail if stdin is not a tty.
Fixes: #1893
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
|
|
A recent change in glibc introduced `enum fsconfig_command` [1] and as a
result the compilation of criu fails with the following errors
In file included from criu/pie/util.c:3:
/usr/include/sys/mount.h:240:6: error: redeclaration of 'enum fsconfig_command'
240 | enum fsconfig_command
| ^~~~~~~~~~~~~~~~
In file included from /usr/include/sys/mount.h:32:
criu/include/linux/mount.h:11:6: note: originally defined here
11 | enum fsconfig_command {
| ^~~~~~~~~~~~~~~~
/usr/include/sys/mount.h:242:3: error: redeclaration of enumerator 'FSCONFIG_SET_FLAG'
242 | FSCONFIG_SET_FLAG = 0, /* Set parameter, supplying no value */
| ^~~~~~~~~~~~~~~~~
criu/include/linux/mount.h:12:9: note: previous definition of 'FSCONFIG_SET_FLAG' with type 'enum fsconfig_command'
12 | FSCONFIG_SET_FLAG = 0, /* Set parameter, supplying no value */
| ^~~~~~~~~~~~~~~~~
/usr/include/sys/mount.h:244:3: error: redeclaration of enumerator 'FSCONFIG_SET_STRING'
244 | FSCONFIG_SET_STRING = 1, /* Set parameter, supplying a string value */
| ^~~~~~~~~~~~~~~~~~~
criu/include/linux/mount.h:14:9: note: previous definition of 'FSCONFIG_SET_STRING' with type 'enum fsconfig_command'
14 | FSCONFIG_SET_STRING = 1, /* Set parameter, supplying a string value */
| ^~~~~~~~~~~~~~~~~~~
/usr/include/sys/mount.h:246:3: error: redeclaration of enumerator 'FSCONFIG_SET_BINARY'
246 | FSCONFIG_SET_BINARY = 2, /* Set parameter, supplying a binary blob value */
| ^~~~~~~~~~~~~~~~~~~
criu/include/linux/mount.h:16:9: note: previous definition of 'FSCONFIG_SET_BINARY' with type 'enum fsconfig_command'
16 | FSCONFIG_SET_BINARY = 2, /* Set parameter, supplying a binary blob value */
| ^~~~~~~~~~~~~~~~~~~
/usr/include/sys/mount.h:248:3: error: redeclaration of enumerator 'FSCONFIG_SET_PATH'
248 | FSCONFIG_SET_PATH = 3, /* Set parameter, supplying an object by path */
| ^~~~~~~~~~~~~~~~~
criu/include/linux/mount.h:18:9: note: previous definition of 'FSCONFIG_SET_PATH' with type 'enum fsconfig_command'
18 | FSCONFIG_SET_PATH = 3, /* Set parameter, supplying an object by path */
| ^~~~~~~~~~~~~~~~~
/usr/include/sys/mount.h:250:3: error: redeclaration of enumerator 'FSCONFIG_SET_PATH_EMPTY'
250 | FSCONFIG_SET_PATH_EMPTY = 4, /* Set parameter, supplying an object by (empty) path */
| ^~~~~~~~~~~~~~~~~~~~~~~
criu/include/linux/mount.h:20:9: note: previous definition of 'FSCONFIG_SET_PATH_EMPTY' with type 'enum fsconfig_command'
20 | FSCONFIG_SET_PATH_EMPTY = 4, /* Set parameter, supplying an object by (empty) path */
| ^~~~~~~~~~~~~~~~~~~~~~~
/usr/include/sys/mount.h:252:3: error: redeclaration of enumerator 'FSCONFIG_SET_FD'
252 | FSCONFIG_SET_FD = 5, /* Set parameter, supplying an object by fd */
| ^~~~~~~~~~~~~~~
criu/include/linux/mount.h:22:9: note: previous definition of 'FSCONFIG_SET_FD' with type 'enum fsconfig_command'
22 | FSCONFIG_SET_FD = 5, /* Set parameter, supplying an object by fd */
| ^~~~~~~~~~~~~~~
/usr/include/sys/mount.h:254:3: error: redeclaration of enumerator 'FSCONFIG_CMD_CREATE'
254 | FSCONFIG_CMD_CREATE = 6, /* Invoke superblock creation */
| ^~~~~~~~~~~~~~~~~~~
criu/include/linux/mount.h:24:9: note: previous definition of 'FSCONFIG_CMD_CREATE' with type 'enum fsconfig_command'
24 | FSCONFIG_CMD_CREATE = 6, /* Invoke superblock creation */
| ^~~~~~~~~~~~~~~~~~~
/usr/include/sys/mount.h:256:3: error: redeclaration of enumerator 'FSCONFIG_CMD_RECONFIGURE'
256 | FSCONFIG_CMD_RECONFIGURE = 7, /* Invoke superblock reconfiguration */
| ^~~~~~~~~~~~~~~~~~~~~~~~
criu/include/linux/mount.h:26:9: note: previous definition of 'FSCONFIG_CMD_RECONFIGURE' with type 'enum fsconfig_command'
26 | FSCONFIG_CMD_RECONFIGURE = 7, /* Invoke superblock reconfiguration */
This patch adds definition for FSOPEN_CLOEXEC to solve this problem. In particular,
sys/mount.h includes ifndef check for FSOPEN_CLOEXEC surrounding `enum fsconfig_command`.
[1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=7eae6a91e9b1670330c9f15730082c91c0b1d570
Reported-by: Younes Manton (@ymanton)
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
|
|
This patch changes top-level OpenJ9 filename and data references to Java
to make them generic and launches tests against both HotSpot and OpenJ9
JVMs.
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
|
|
Semeru builds (which use OpenJ9 instead of HotSpot) are the successors
of AdoptOpenJDK's OpenJ9 builds.
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
|
|
We used to pull AdoptOpenJDK's OpenJ9 builds but switched to
Eclipse Temurin, which uses the HotSpot VM instead of OpenJ9.
Rename the corresponding Dockerfiles to hotspot.
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
|
|
The entry "build/" will ignore any directory named "build" at any level
of the source tree, including our scripts/build directory. We only want
to ignore the top-level build directory created by `make install`.
As the git manpage suggests, entries with slashes at the start or in the
middle will only match at the same level as the .gitignore, hence use
build/** instead.
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
|
|
Check that CRIU handles non-empty listen queues properly.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
[mclapinski@google.com: update test_doc and test_author]
Signed-off-by: Michal Clapinski <mclapinski@google.com>
|
|
This allows to make test code more compact:
if (ret == -1) {
pr_perror("XXX");
return 1;
}
vs
if (ret == -1)
return pr_perror("XXX");
Signed-off-by: Andrei Vagin <avagin@gmail.com>
|
|
Before this change, CRIU would just lose that data upon migration. So
it's better to fail migration in this case.
To reproduce the bug one can:
1. Create an AF_UNIX socket and call listen on it.
2. Create a second AF_UNIX socket and call connect to the first one.
3. Send the data to the second socket.
4. Migrate.
5. Call accept on the first socket and then read. There would be no data
available.
It should be even possible to close the second socket before migration.
This would cause accept to hang because CRIU totally misses a closed
in-flight socket.
Signed-off-by: Michal Clapinski <mclapinski@google.com>
|
|
Signed-off-by: fu.lin <fulin10@huawei.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
|
|
The x86 implement hardware breakpoint to accelerate the tracing syscall
procedure instead of `ptrace(PTRACE_SYSCALL)`. The arm64 has the same
capability according to <<Learn the architecture: Armv8-A self-hosted
debug>>[[1]].
<<Arm Architecture Reference Manual for A-profile architecture>[[2]]
illustrates the usage detailly:
- D2.8 Breakpoint Instruction exceptions
- D2.9 Breakpoint exceptions
- D13.3.2 DBGBCR<n>_EL1, Debug Breakpoint Control Registers, n
Note:
[1]: https://developer.arm.com/documentation/102120/0100
[2]: https://developer.arm.com/documentation/ddi0487/latest
Signed-off-by: fu.lin <fulin10@huawei.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
|
|
Signed-off-by: fu.lin <fulin10@huawei.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
|
|
Breakpoints are used to stop as close as possible to a target system call.
First, we don't need it after this point.
Second, PTRACE_CONT can't pass through a breakpoint on arm64.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
|
|
When delivering system call traps, set bit 7 in the signal number (i.e.,
deliver SIGTRAP|0x80). This makes it easy for the tracer to distinguish
normal traps from those caused by a system call.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
|