Age | Commit message (Collapse) | Author |
|
Here are two more eqsy but annoying bugs fixed before we
tag the 1.6 -- one with multi-threadded task restore and
the other one with big rpc messages processing.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
In case of rst_sibling, we are tracing the root task, but we are tracing
only the leader thread, so we must attach to other threads to stop them.
Reported-by: Ross Boucher <rboucher@gmail.com>
Cc: Ross Boucher <rboucher@gmail.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Tested-by: Ross Boucher <rboucher@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
Currently we use a static buffer, but it is too small.
Error (cr-service.c:58): Failed unpacking request: Success
Error (cr-service.c:694): Can't recv request: Success
data too short after length-prefix of 1217
v2: use recv instead on recvmsg
Reported-by: Ross Boucher <rboucher@gmail.com>
Cc: Ross Boucher <rboucher@gmail.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Ruslan Kuprieiev <rkuprieiev@cloudlinux.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
With 1.5.1 we forestall criu crash on soon to be released
4.0 kernel due to uninitialized ss on restore sigframe.
Another thing we "fix" is legalize swrk API and add the
ability for inheriting fds via it. This is required for
libcontainer & Docker C/R and we want it to be available
before 1.6'th release in June.
Plus we included cgroup yard destruction and properties
restore that were spotted soon after 1.5.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
inotify_irmap creates files in /etc so it should be able to do
this from userns.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
libcontainer saves PID in a state file.
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
Currenlty if criu fails before prepare_cgroup_properties(),
cgyard isn't umounted.
I think it's déjà vu, but it isn't :)
commit 28b0e16d730ec21a515b2686961c6312816c47f3
Author: Andrew Vagin <avagin@openvz.org>
Date: Mon Aug 25 14:29:00 2014 +0400
cgroup: call fin_cgroup() on error paths
Cc: Tycho Andersen <tycho.andersen@canonical.com>
Cc: Saied Kazemi <saied@google.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
We need to finalize the cg yard both on successful cgroup restore and on a
failed restore. Further, we should restore the cgroup properties before
allowing the task to continue in all modes (previously properties were only
restored correctly in --restore-detached mode).
CC: Saied Kazemi <saied@google.com>
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
If the --restore-detached command line option is not specified during
restore, CRIU should unmount and remove the temporary cgyard directory
tree before waiting for the restored process to exit. Otherwise, all
the temporary cgyard mount points will remain mounted and visible.
Signed-off-by: Saied Kazemi <saied@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
This is required to use criu swrk in libcontainer.
v2: remove useless function declaration
allow to set inherit_fd only for swrk
v3: check swrk out of loop
Cc: Saied Kazemi <saied@google.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
Before the recent "x86_64,signal: Fix SS handling for signals delivered
to 64-bit programs" kernel patch, sigreturn paths forgot to restore ->ss
after return from the signal handler.
Now that the kernel was fixed, restore_gpregs() has to initialize ->ss
too, it is no longer ignored.
Note: this is the minimal fix. In the long term we probably should not
dump/restore the segment registers at all. We can use sigcontext filled
by the target kernel and modify the general-purpose regs.
Reported-and-tested-by: Andrey Wagin <avagin@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
Here it is. The major new thing I think is the CRIT tool
that will be the main one to mainupulate images and will
eventually replace the "criu show" action.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
Reported-by: Mr Jenkins
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
Reported-by: Mr Jenkins
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
and print errno for the wait syscall in an error case
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
If we parse /proc/pid/status when a task isn't stopped,
we can't be sure that a process state will not be changed.
08:58:48 Test: zdtm/live/user/static/zombie00, Namespace: 1
08:58:48 Dump log : /var/lib/jenkins/jobs/CRIU-dump/workspace/test/dump/ns/user/static/zombie00/114/1/dump.log
08:58:48 --------------------------------- grep Error ---------------------------------
08:58:48 (00.001127) Error (ptrace.c:124): SEIZE 121: task not stopped after seize
v2: don't believe into errno (by xemul@)
Reported-by: Mr Jenkins
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
/dev/tty stands for current terminal which we don't yet
implemented a support for.
This is a bugfix for upcoming stable version, the proper
support of /dev/tty is gonna be implemented separately.
Reported-by: Saied Kazemi <saied@google.com>
CC: Andrew Vagin <avagin@parallels.com>
CC: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
A tests is executed from different users in cases with and without
userns, so it can't to open files which were created before.
Here is an example for ns/user/static/inotify_irmap:
13355 mkdir("/etc", 0600) = -1 EEXIST (File exists)
13355 unlink("/etc/zdtm-test") = -1 EACCES (Permission denied)
13355 creat("/etc/zdtm-test", 0600) = -1 EACCES (Permission denied)
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
One test can't be execute as ns/test ans ns/user/test
simultaneously, because they use the same file tree
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
For an established TCP connection, the send queue is restored in two
steps: in step (1), we retransmit the data that was sent before but not
yet acknowledged, and in step (2), we transmit the data that was never
sent outside before. The TCP_REPAIR option is disabled before step (2)
and re-enabled after step (2) (without this patch).
If the amount of data to be sent in step (2) is large, the TCP_REPAIR
flag on the socket can remain off for some time (O(milliseconds)). If a
listen() is called on another socket bound to the same port during this
time window, it fails. This is because -- turning TCP_REPAIR off clears
the SO_REUSEADDR flag on the socket.
This patch adds a mutex (reuseaddr_lock) per port number, so that a
listen() on a port number does not happen while SO_REUSEADDR for another
socket on the same port is off.
Thanks to Amey Deshpande <ameyd@google.com> for debugging.
Signed-off-by: Saied Kazemi <saied@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
"%m" can't be used to print strerror(errno), because test_msg()
calls gettimeofday() which can overwrite errno.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
tname doesn't contain a test type.
Reported-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
"%m" can't be used to print strerror(errno), because print_on_level()
calls gettimeofday() which can overwrite errno.
For example:
13486 connect(4, {sa_family=AF_INET, sin_port=htons(8880), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ENETUNREACH (Network is unreachable)
13486 gettimeofday({1423756664, 717423}, NULL) = 0
13486 open("/etc/localtime", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
13486 write(2, "15:57:44.717: 4: ERR: socket_udp.c:73: Can't connect (errno = 101 (Permission denied))\n", 91) = 91
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
We link files to each other at restore time to restore
unlinked paths. Kernel has strange secutiry restrictions
about linkat we use. If the fsuid of the caller doesn't
equals the uid of the file and the file is not "safe"
one, then only global CAP_CHOWN will be allowed to link().
This brings problems in user namespaces -- uns root is
not allowed to linkat any file, unlike global root.
Fortunately, we can change the fsuid temporarily and
still linkat the file we want. Hopefully this hack will
go away some day soon, when the kernel will have saner
checks for linkat capabilities.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
|
|
The test uses map_files dir to check for mapping being restored,
while this proc directory is only available for CAP_SYS_ADMIN.
Fix this by checking less strict /proc/pid/maps.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
This allows excluding more userns tests from blacklist.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
|
|
The rest partially need more userns_call-s but mostly just don't
work in userns themselves. Need further investigation.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
|
|
Locked termios require global CAP_SYS_ADMIN. But let's
restore everything for tty in one call since regular
termios depend on locked and it's not nice to do sync
usernsd call for locked only.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
|
|
The syscall in question requires global CAP_DAC_READ_SEARCH.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
|
|
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
|
|
We have collected a good set of calls that cannot be done inside
user namespaces, but we need to [1]. Some of them has already
being addressed, like prctl mm bits restore, but some are not.
I'm pretty sceptical about the ability to relax the security
checks on quite a lot of them (e.g. open-by-handle is indeed a
very dangerous operation if allowed to unpriviledged user), so
we need some way to call those things even in user namespaces.
The good news about it its that all the calls I've found operate
on file descriptors this way or another. So if we had a process,
that lived outside of user namespace, we could ask one to do the
high priority operation we need and exchange the affected file
descriptor via unix socket.
So the usernsd is the one doing exactly this. It starts before we
create the user namespace and accepts requests via unix socket.
Clients (the processes we restore) send him the functions they
want to call, the descriptor they want to operate on and the
arguments blob. Optionally, they can request some file descriptor
back after the call.
In non usernamespace case the daemon is not started and the calls
are done right in the requestor's process environment.
In the next patch there's an example of how to use this daemon
to do the priviledged SO_SNDBUFFORCE/_RCVBUFFORCE sockopt on
a socket.
[1] http://criu.org/UserNamespace
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@openvz.org>
|
|
User is already able to see it in stdout, so there is no
reason why we should protect it.
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
If criu run with suid bit set, user should be able
to read pidfiles(i.e. service pidfile).
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
This will allow us to easily extend commands that crit
supports, avoiding "--help" confusion.
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
Starting with version 3.15, the kernel provides a mnt_id field in
/proc/<pid>/fdinfo/<fd>. However, the value provided by the kernel for
AUFS file descriptors obtained by opening a file in /proc/<pid>/map_files
is incorrect.
Below is an example for a Docker container running Nginx. The mntid
program below mimics CRIU by opening a file in /proc/1/map_files and
using the descriptor to obtain its mnt_id. As shown below, mnt_id is
set to 22 by the kernel but it does not exist in the mount namespace of
the container. Therefore, CRIU fails with the error:
"Unable to look up the 22 mount"
In the global namespace, 22 is the root of AUFS (/var/lib/docker/aufs).
This patch sets the mnt_id of these AUFS descriptors to -1, mimicing
pre-3.15 kernel behavior.
$ docker ps
CONTAINER ID IMAGE ...
3850a63ee857 nginx-streaming:latest ...
$ docker exec -it 38 bash -i
root@3850a63ee857:/# ps -e
PID TTY TIME CMD
1 ? 00:00:00 nginx
7 ? 00:00:00 nginx
31 ? 00:00:00 bash
46 ? 00:00:00 ps
root@3850a63ee857:/# ./mntid 1
open("/proc/1/map_files/400000-4b8000") = 3
cat /proc/49/fdinfo/3
pos: 0
flags: 0100000
mnt_id: 22
root@3850a63ee857:/# awk '{print $1 " " $2}' /proc/1/mountinfo
87 58
103 87
104 87
105 104
106 104
107 104
108 87
109 87
110 87
111 87
root@3850a63ee857:/# exit
$ grep 22 /proc/self/mountinfo
22 21 8:1 /var/lib/docker/aufs /var/lib/docker/aufs ...
44 22 0:35 / /var/lib/docker/aufs/mnt/<ID> ...
$
Signed-off-by: Saied Kazemi <saied@google.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
If a process is executed in another pidns, a /proc/PID doesn't link with
the proper process.
This patch fixes a problem like this:
1: Error (util.c:106): Unable to close fd 33: Bad file descriptor
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
There are two places where we store IP addresses (both IPv4 and IPv6).
Mark them with custom option and print them in compressed form for
--pretty output.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Ruslan Kuprieiev <kupruser@gmail.com>
|
|
I plan to mark some fields as IP address and print them respectively.
The --format hex is not nice switch for this and introducing one more
(--format hex ipadd) is too bad.
So let's fix the cirt API to be simple and stupid. By default crit
generates canonical one-line JSON. With --pretty option it splits the
output into lines, adds indentation and prints hex as hex and IP as
IP.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Ruslan Kuprieiev <kupruser@gmail.com>
|
|
Can be useful to re-run some tests in case smth failed in the middle
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
|
|
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|
|
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
|