This is the 20 year anniversary edition of FOSDEM, and it's big.
The convergence of the geeks.
{section "General"}
There are live streams for all talks, which is nice because there are
also quite long lines when waiting to enter a room.
* If you want to watch my retrocomputing talk about
Alpha Waves, it will be live at 3PM
here.
* My lightning talk about
Concept Programming should be
live at 6:20P in the Lightning Talks room.
* My talk about
the XL programming language is tomorrow at 10:10. You should be
able to watch it live from
this stream if you have nothing else to do on your Sunday morning.
As an aside, I noticed that various places had links to outdated
Facebook and Twitter entries. A while ago, I dropped both Facebook and
Twitter, and I reluctantly came back, but could not get my old
handles. So on Twitter, it's `@zanyware` now and not `@descubes`, and
on Facebook, it's `christophe.dedinechin.18` (I doubt there are 18 of
us, but oh well...)
A bit concerned with the battery lifetime of my laptop. Yesterday, it
stopped at around 50%. Hopefully I won't have the problem today.
{section "Keynote: We have to finish that thing one day"}
Keynote given by
Thorsten Leemhuis. As usual, it's crip, very detailed, very
informed.
The key topic of the talk is hwo Linux wins by solving big
problems little by little. "Solve big problems in small
steps". Interestingly, one of the examples he took are continers, and
he retraced some of the history that I outlined in my
DevConf.cz 2020.
Talks about BPF, cBPF vs eBPF, mentions of it replacing the Linux
kernel someday ;-) This is not as oulandish as it sounds, IMO. Talks
about DTrace, etc.
Also gives interesting counter-examples, e.g. BTRFS vs. ZFS. Does not
however mention that ZFS is cross-platform, whereas BTRFS is
not. BTRFS was initially overhyped. Will Linux some day get something
that competes credibly with ZFS? Probably yes, but will take 10-20
years. Will it be `bcachefs`? Not submitted upstream yet.
Problems of Linux kernel development: no central forge, everything
done through mail, long unstable development phases. Reminds that
initially we did not even have a version control system. But then got
`git` in 2005, the second world-changing project by Linus. Got a
mostly predictable release cycle. Got stable and long term
kernels. Hundreds of mailing lists. Still no automated central code
checking in a central place (is that a bad thing?) The amazing thing
is that he could say such things without any reaction from the
audience. No booohs, no aaahs.
Should the Linux foundation help more? "Not sure about that". Linux
development runs at the usual pace. "Famous last words, but the patch
volume has to drop off one day" (Andrew Morton).
Gave a link to Brendan D. Gregg's
page, which seems to be a raw collection of links. At some point,
I need to spend the time reading all that.
*Opinion* I truly believe this guy should be part of the teams
building "whole Linux" documentation packages. He knows what changes,
he knows how to explain it, and he knows how to make sense of a large
pile of somewhat unrelated topics.
{section "Kata Containers on openSUSE"}
Talk about Kata Containers on openSUSE. Curious to see if I will
learn anything interesting ;-)
Starts with the very basics, i.e. "running containers in virtual machines".
"If you want to escape the container", you need to escape two layers,
so that's improved isolation". I don't think this is necessarily true
given the number of technologies designedd to bypass overhead
e.g. you could end up controlling a network card virtual function (VF)
directly when you use DPDK.
They are using a smaller kernel in the container, called "KVM small
Want to use QEMU microVM.
OCI compatible.
Mention replacement of `9pfs` (slow) with `virtiofs`. Did not know
exactly when it was merged into the kernel. I think it's 5.3.15,
definitely there in 5.4, though the `qemu` part of it was only merged
last week.
Mention that a small change has to be made to be able to run
`rootless`. Need to add runtimes in `libpod.conf`, because they use a
non-standard path. So you can add that in the `kata-runtime` section
of `libpod.conf`.
{section "Evolution of kube-proxy"}
Datadog has 10000's hosts in their infra, were hitting scalability
limits.
kube-proxy running on each k8s node. Implements the k8s service
abstraction.
Initial proxy implementation was from user-space. "Proxy mode = userspace".
An `iptables` rule redirects traffic to the proxy, which will do the
load-balancing between nodes. Prerouting sends to portal containers.
Limitations: performance, and source IP cannot be kepts. Since k8s
1.2, default is `iptables`.
Another limitation is that `iptables` was not designed for load
balancing. It's hard to debug with 10K rules. Performance impact:
* Control plane: syncing rules.
* data plane: going through rules.
20K services = 160K rules = 5 hours to load them.
Proxy mode = `ipvs` (only start talking about it halfway through the
talk, which I believe was a bit late).
Service with 2K endpoints ~100B / endpoint 5K nodes. Each node gets
2Kx100B = 200k.
Addressed recently in k8s with endpoint slices. Maximum 100 endpoints
in each slice. Much more efficient for services with many
endpoints. Beta in k8s 1.17.
{section "Containers live migration"}
A
talk by
Adrian Reber about how to transfer a running container around.
CRIU: Checkpoint Restore in User Space
* First step: checkpoint a container using `ptrace()`.
* Generate parasite code, injected into the process :-(
* Then restart the process with the parasite code, daemon waiting for
commands.
* Then checkpoint continues.
That all sounds perfectly reasonable to me ;-)
"If you run with podman, you probably have SELinux, and CRIU does
things that SElinux does not really approve of". NSS (No Shit
Sherlock, said in a work-safe place).
There is
another talk about this that is probably interesting.
Use `clone3()` for each PID/TID, which might be better.
A user of this is Google in their container runtime Borg to
live-migrate processes in production a lot. Apparently happy with how
it works. LXC/LXD has a long history of CRIU integration. For Docker,
need an experimental mode to use it, unmaintained.
Useful commands:
{{{
podman container checkpoint
podman container restore
}}}
Q: Stuff from Borg tends to flow into k8s. Will this happen in k8s.
A: No sign of this happening. Problem is that containers are
stateless, why would you want to migrate them.
Tried to migrate a database, but database shutdown after migration,
which might be caused by time differences. Time namespace has been
accepted in Linux, might help.
{section "Supervising and emulating syscalls"}
Talk about how to intercept sysscalls.
Seccomp runs before the syscall. Seccomp never blocks. It asks
userspace for return value and errno. Execution does not continue in
the kernel, userspace must do the work.
Slides are a bit weak on content compared to what is being said, which
is very dense. So this is the typical case where not listening to the
talk for 30 seconds gets you totally lost and you cannot recover by
looking at the slides. Chrisian Brauner clearly knows what he's
talking about, but there is really much (too much?) more than what is
on the slides.
Uses `lxc` for the demo. Demo starts with `cat /proc/self/uid_map` and
`mknod bbb c 5 1` trying to create a device. The amazing thing is that he his
explaining what is happening under the hood, so not as obscure as it
might first seem.
{{{
lxc config set f1 security.syscallsintercept.mknod true
lxc restart --force f1
lxc shell f1
}}}
Then can do an `mknod` and then `stat ./bbb` (the device node just
created).
Showed that the policies allow to control `mount` and the associated
type system, e.g. `mount.ext4`.
{section "Below Kubernetes: Demystifying container runtimes"}
Talk about
what is happening below k8s. More specifically, the space between
k8s and the Linux kernel.
It's a "mess of overlapping projects and products". (glad it's not
just me). "How many different meanings can _container runtime_ have?"
OCI established circa 2015 to try and unify things.
Container runtime interface established Dec 2016. Primitives to manage
pods of containers. A single interface for `rkt` and `docker`.
Thierry Carrez is creating diagrams that look way too similar with what I
showed at DevConf.cz (i.e. increasing number of boxes showing up as
time goes by). Current state looks like this:
{picc "images/200201-Containers.jpg"}
I will clearly need to link to the slides or video if they are posted,
because the evolution is funny.
Time for me to play with mermaid:
{diagram}
graph TD
k8s[Kubernetes] --> ccd[cri-containerd]
ccd --> cd[containerd]
cli[docker CLI] --> cd
ccd --> runc[runC]
{margaid}
"The dirty secret of containers: they are not very good at containing".
In the real world, they run in VMs.
Firecracker is a "highly opnionated runtime".
"That is when the diagram began to become too complex", e.g. directly
connect `containerd` to `firecracker`. Also the case for Kata
Containers to "leverage advanced features", i.e. things that are not
in the OCI runtime interface.
{section "Alpha Waves"}
Since it takes a lot of time to switch rooms, that was the time I
decided to leave the "Containers" track and join the "Retrocomputing"
track. I almost had a major accident, splashing water over my keyboard
minutes before the talk, but fortunately no damage. I was concerned
because regular Apple keyboard are notoriously sensitive to liquids. I
already lost at least 3 keyboards to a single splash of liquid by one
of my kids. Apparently, better with the PowerBook.
{section "BASICODE"}
Learned about something called
BASICODE, which I had never heard of. A way to send BASIC programs over the airwaves, with an API
made of `GOSUB` subroutines with pre-determined line numbers. So
something like `GOSUB 100` would do a "clear screen" whether the
program ran on Apple II or Sinclair Spectrum. Super weird. One or two
people in the room had actually used it to download software from
radio programs.
{section "Retro music - Open Cubic Player"}
An interesting talk about
music in the old days (Amiga `.mod` files if you can remember
that).
Nostalgia for adlib sound.
"OpenSource was a real eye opener for learning how to program"
"Your multimedia program is an operating system in itself, except file
system control"
{section "Reviving le MINITEL"}
(I briefly mentioned Minitel during my talk, there was a "3615 Infogrames"
sign on one of the Alpha Waves boxes.
In the late 60s, France had only 15% phone lines, 3 years installation
delay (vs 90% and 3 days for the US). Last manual switch was
decomissioned in 1978, first automated switch had been tested in 1912.
France had a plan. Packet switched network "Transpac", from 50 bits/s
for telex up to 64kbit/s. Heterogeneous. New pricing, depending on
rate and connexion time, not on distance.
Transpac had B2B applications (banks, etc), and B2C applications
(Minitel).
Minitel = Videotex screen with a keyboard connected with a
modem. 1974: BBC's Ceefax. 1979, CCETT's Antiope.
In France, microcomputing in France was hard to grasp, no
network. Very late on modems. Free terminal. One minitel cost 260
euros (1000F) to build. 6.5M of minitels installed. 750M euros of
installed Mintel. Paid by the French state, which needed to get their
money back.
40 colums x 25 rows, could display 8 colors shown as shades of
gray. Rate 1200 baud download, upload at 75 bits/s. Videotex offered
64 mosaic characters, which divded characters in 2x3 pixels, so could
have 80x72 "graphic" resolution.
Very complex set of attributes, each character is coded as 16 bits in
the Minitel memory. Could encode 449 characters.
They even went as far as reconstructing Minitel services, and you
could see the display reconstruction in its full 1200 baud glory,
including a famous "Minitel Rose", 3615 ULLA, which was plastering all
the walls in France for years with little stickers advertising their
service.
{picc "images/200201-Minitel.jpg"}
{section "Gate project"}
Portable execution state, i.e. compiles stuff to wasm, and then
can suspend on one machine (x86) and resume execution on a different
machine (arm64).
Reminded me a lot of
TAOS (or at least the promises it had), but done with modern
technologies. It only took 25 years.
The room is almost empty. Lightning talks are not very
successful. Also, running late (10 minutes behind schedule). Had I
known, I would probably have watched the
VIC-20 cartridge reverse engineering talk, and missed the talk
about the Gate project. That is one of the talks I had marked on my
calendar, so I'm happy it happened that way.
{section "The pool next to the ocean"}
Subtitle: How to bring open source skills to more people.
Talk
about contribution.
{section "Tracking local storage configuration on linux"}
Using
continuous recording of disk events in order to be able to
recreate a configuration after the fact.
Command is `lsblkj` which filters stuff from the journal (the `j` in
the command name) and extracts stuff that matters
wrt. storage. Behaves like `lsblkj`, but misses filesystem info
because that is not logged yet.
The user interface reminds me of Patrick Duverger's "open data map"
Need to look at Skydive later.
(Rant)
The projector has a lot of echoing. I'm glad I'm using a pastel
background instead of straight black and white. We'll see how it
plays "for real".
{section "In other news (not FOSDEM): Blogmax improvements"}
Still working on improving the
BlogMax package to fit my needs. So you will sometimes see
incorrectly formatted output on this page. You can always refer
to the source code for reference.
As I write this, for example, this is the case for `tt` formatting,
and the root cause seems to be that Emacs Lisp
`replace-regexp-in-string` does not seem to recognize the `\\b` form
(word boundary). It does work in `re-search-forward`, so that's
weird. Also, simple experiments with the `*scratch*` buffer in Emacs
show that this is not really the problem.
{{{
(replace-regexp-in-string "\\btoto\\b" "tata\\&" "this is toto and atoto")
"this is tatatoto and atoto"
}}}
But then:
{{{
(replace-regexp-in-string "\\b_\\b" "\\1" "This is _an example_")
"This is _an example"
(replace-regexp-in-string "\\b_\\b" "\\&" "This is _an example_")
"This is _an example_"
}}}
WAT? How is `e` even seen as a word boundary? Something is definitely
wrong with this function. Ah, no. The `_` itself _makes it_ a word
boundary. Let me check that idea:
{{{
(replace-regexp-in-string "\\b." "[\\&]" "This is _an example_")
"[T]his[ ][i]s[ ]_[a]n[ ][e]xample[_]"
}}}
OK, so then the correct regexp syntax is to use `\s-` in the regexp
and reinject whatever was matched, i.e.:
{{{
("\\(\\s-\\)_\\(.*?\\)_" "\\1\\2")
("\\(\\s-\\)\\*\\(.*?\\)\\*" "\\1\\2")
("\\(\\s-\\)`\\(.*?\\)`" "\\1\\2")
}}}
That seems to work.