Bug 13985 - CTDB /etc/ctdb/functions wrong ss syntax in update_tickles()
Summary: CTDB /etc/ctdb/functions wrong ss syntax in update_tickles()
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: CTDB (show other bugs)
Version: 4.10.4
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Amitay Isaacs
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-04 00:35 UTC by Rafael David Tinoco
Modified: 2019-06-06 12:36 UTC (History)
3 users (show)

See Also:


Attachments
ctdb-scripts: Fix ss syntax in update_tickles() (3.59 KB, patch)
2019-06-04 01:48 UTC, Rafael David Tinoco
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rafael David Tinoco 2019-06-04 00:35:27 UTC
During CTDB execution I got the following error:

019/05/31 18:21:38.648661 ctdbd[1000]: Starting traverse on DB ctdb.tdb (id 806)
2019/05/31 18:21:38.651601 ctdbd[1000]: Ending traverse on DB ctdb.tdb (id 806), records 0
2019/05/31 18:21:40.185388 ctdb-eventd[1002]: 60.nfs: ss: bison bellows (while parsing filter): "syntax error!" Sorry.
2019/05/31 18:21:40.185421 ctdb-eventd[1002]: 60.nfs: Usage: ss [ OPTIONS ]
2019/05/31 18:21:40.185433 ctdb-eventd[1002]: 60.nfs:        ss [ OPTIONS ] [ FILTER ]
2019/05/31 18:21:40.185441 ctdb-eventd[1002]: 60.nfs:    -h, --help          this message
2019/05/31 18:21:40.185449 ctdb-eventd[1002]: 60.nfs:    -V, --version       output version information

Enabling debug in /etc/ctdb/functions (-x) I got the following:

60.nfs: + _port=2049
60.nfs: + tickledir=/var/lib/ctdb/scripts/tickles
60.nfs: + mkdir -p /var/lib/ctdb/scripts/tickles
60.nfs: + ctdb_get_pnn
60.nfs: + _pnn_file=/var/lib/ctdb/scripts/my-pnn
60.nfs: + [ ! -f /var/lib/ctdb/scripts/my-pnn ]
60.nfs: + cat /var/lib/ctdb/scripts/my-pnn
60.nfs: + _pnn=0
60.nfs: + /usr/bin/ctdb -X ip
60.nfs: + awk -F| -v pnn=0 $3 == pnn {print $2}it
60.nfs: + _ips=172.16.17.3
60.nfs: + _ip_filter=
60.nfs: + _ip_filter=src [172.16.17.3]
60.nfs: + _port_filter=sport == :2049
60.nfs: + _my_connections=/var/lib/ctdb/scripts/tickles/2049.connections.12623
60.nfs: + ss -tn state established ( src [172.16.17.3] ) ( sport == :2049 )

Showing that the problem was in ss syntax coming out of update_tickles() in /etc/ctdb/functions.

I have fixed it so instead of:

ss -tn state established ( src [172.16.17.3] ) ( sport == :2049 )

Like the script is executing now, proper ss syntax is created and executed, like:

ss -tn state established '( src [172.16.17.3] ) && sport == :2049'
OR
ss -tn state established '( src [172.16.17.2] || src [172.16.17.3] ) && sport == :2049'

depending on the number of _ips in script logic.

I'm posting the patch to the mailing list pointing out this bug.
Comment 1 Rafael David Tinoco 2019-06-04 01:46:54 UTC
I have provided the patch in the following e-mail:

https://lists.samba.org/archive/samba-technical/2019-June/133701.html

And I'm also attaching it to this case.

Thanks for considering it.

Rafael
--
Rafael D. Tinoco
Comment 2 Rafael David Tinoco 2019-06-04 01:48:10 UTC
Created attachment 15216 [details]
ctdb-scripts: Fix ss syntax in update_tickles()
Comment 3 Martin Schwenke 2019-06-04 04:44:45 UTC
Hi Rafael,

While I understand the change your patch is making, I don't understand why the existing syntax for the ss command does not work.  :-(

When I try this out by hand on a Debian unstable system by ssh-ing to localhost, this works:

$ ss -tn state established '( src [127.0.0.1] )' '( sport  == :22 )'
Recv-Q         Send-Q                   Local Address:Port                   Peer Address:Port          
0              0                            127.0.0.1:22                        127.0.0.1:48640         

Can you please try some examples by hand to try to understand exactly why the command is failing?

I'm wondering if there has been some change in upstream ss (which should be reflected in Debian unstable), if the system you're using patches ss in some unique way, or if I'm being thick and missing something obvious after a late night.  :-)

Thanks...
Comment 4 Rafael David Tinoco 2019-06-04 11:23:48 UTC
Hello Martin,

Errrr, that puzzled me now, my workstation is a debian sid and, for obvious reasons, all my development environment is Ubuntu... I re-checked "ss" execution in Debian and it worked :\, went back to Ubuntu Eoan and it didn't.

inaddy@workstation:~$ ss -tn state established '( src [172.16.0.3] || src [172.16.0.3] ) ( sport == :22 )' | small
Recv-Q Send-Q Local Address:Port Peer Address:Port
0 0 172.16.0.3:22 172.16.0.6:53580
0 0 172.16.0.3:22 172.16.0.6:62337
0 0 172.16.0.3:22 172.16.0.6:53587
...

(k)inaddy@ctdbserver01:~$ ss -tn state established '( src [172.16.0.3] || src [172.16.0.3] ) ( sport == :22 )' | small
ss: bison bellows (while parsing filter): "syntax error!" Sorry.
Usage: ss [ OPTIONS ]
       ss [ OPTIONS ] [ FILTER ]
   -h, --help          this message
   -V, --version       output version information
   -n, --numeric       don't resolve service names
   -r, --resolve       resolve host names
   -a, --all           display all sockets
...

Checking different versions:

inaddy@workstation:~$ rmadison iproute2 | awk '{print $1 $2 $3 $4 $5}'
iproute2|3.16.0-2|oldstable
iproute2|4.9.0-1+deb9u1|stable
iproute2|4.9.0-1+deb9u1|stable-debug
iproute2|4.14.1-1~bpo9+1|stretch-backports
iproute2|4.20.0-2~bpo9+1|stretch-backports
iproute2|4.20.0-2~bpo9+1|stretch-backports-debug
iproute2|4.20.0-2|testing
iproute2|4.20.0-2|unstable
iproute2|4.20.0-2|unstable-debug
iproute2|5.1.0-1|experimental
iproute2|5.1.0-1|experimental-debug

inaddy@workstation:~$ rmad iproute2 | awk '{print $1 $2 $3 $4 $5}'
iproute2|3.12.0-2|trusty
iproute2|3.12.0-2ubuntu1.2|trusty-updates
iproute2|4.3.0-1ubuntu3|xenial
iproute2|4.3.0-1ubuntu3.16.04.5|xenial-updates
iproute2|4.15.0-2ubuntu1|bionic
iproute2|4.18.0-1ubuntu2~ubuntu18.04.1|bionic-backports
iproute2|4.18.0-1ubuntu2|cosmic
iproute2|4.18.0-1ubuntu2|disco
iproute2|4.18.0-1ubuntu2|eoan

Since Ubuntu is using an older version for quite awhile, I think latest change to this line broke compatibility with older versions. That, per se, could justify this patch. 

commit 04fe9e20749985c71fef1bce7f6e4c439fe11c81
Author: Martin Schwenke <martin@meltin.net>
Date:   Thu Aug 27 00:22:49 2015

    ctdb-scripts: Use ss instead of netstat for finding TCP connections

    ss with a filter is much faster than post-processing output from
    netstat.  CTDB already has a hard dependency on iproute2 for IP
    address handling, so depending on ss is no big deal.

    Signed-off-by: Martin Schwenke <martin@meltin.net>
    Reviewed-by: Amitay Isaacs <amitay@gmail.com>

This is the change that introduced this "ss" code, and, I believe that this has been broken in Ubuntu since then. Actually I'm not aware that CTDB ever worked in Ubuntu the way it should, that's why I'm working in the following Bugs:

https://bugs.launchpad.net/ubuntu/+source/samba/+bug/722201
https://bugs.launchpad.net/ubuntu/+source/samba/+bug/1821775
https://bugs.launchpad.net/ubuntu/+source/samba/+bug/1828799
https://bugs.launchpad.net/ubuntu-server-ha/+bug/1831381
https://bugzilla.samba.org/show_bug.cgi?id=13984
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=929931

To make it fully function for Eoan, and backport fixes up to Bionic, *at least*. Would you mind including this patch so we can have iproute2 backwards compatibility ? (Upgrading iproute2 in Ubuntu would mean re-certificating a bunch of stuff depending on it, including all openstack products, etc).

Thanks a lot!

Best Regards,
Rafael
Comment 5 Martin Schwenke 2019-06-05 15:47:45 UTC
Hi Rafael,

Sorry for taking a while to respond.  I was waiting for a response from you and then I realised I wasn't subscribed to this bug.  ;-)

I think that if there is a problem here then we should fix it.  However, I would like to determine the exact problem.

I think your patch does a bit too much.  I don't think we need to change the way $_ip_filter is constructed.   The syntax that uses "||" should be OK, and the current loop is nice and lightweight because it doesn't fork any subprocesses.  I also don't think we need to add the "&&" -if there are 2 conditions then they are implicitly and-ed.  We probably also don't need the early returns, though I need to think a bit more about that...  it would be a separate bug.

If there is a problem I think we need to figure out exactly what it is.  We probably only need to fix the final command-line.

The current code generates a command line like this:

$ ss -tn state established "( src [172.16.17.2] || src [172.16.17.3] )" "( sport == :2049 )"
Recv-Q         Send-Q                    Local Address:Port                   Peer Address:Port         

The above was run on an Ubuntu 18.04 machine with the following version of iproute2 installed:

$ dpkg -l iproute2
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                 Version         Architecture    Description
+++-====================-===============-===============-=============================================
ii  iproute2             4.15.0-2ubuntu1 amd64           networking and traffic control tools

It didn't fail.  Sorry if I've missed something in your explanation...

Can you please run some experiments on the command-line and figure out exactly what causes the failure and which versions of Ubuntu's iproute2 are affected?

I'm going crazy trying to find the documentation for ss filters.  I would be very grateful if you can point me at it!  :-)
Comment 6 Rafael David Tinoco 2019-06-05 16:19:35 UTC
Hello Martin,

No problem at all!

My tests:

----

# Ubuntu Ubuntu 16.04 LTS (Xenial Xerus)

(c)inaddy@xenial:~$ ss -tn state established "( src [172.16.17.2] || src [172.16.17.3] )" "( sport == :2049 )"
Recv-Q Send-Q                    Local Address:Port                                   Peer Address:Port    

# Debian Sid

inaddy@workstation:~$ ss -tn state established "( src [172.16.17.2] || src [172.16.17.3] )" "( sport == :2049 )"
Recv-Q            Send-Q                         Local Address:Port                         Peer Address:Port  

# Ubuntu 18.04 LTS (Bionic Beaver)

(c)inaddy@bionic:~$ ss -tn state established "( src [172.16.17.2] || src [172.16.17.3] )" "( sport == :2049 )"
Recv-Q           Send-Q                          Local Address:Port                         Peer Address:Port

# Ubuntu 18.10 (Cosmic Cuttlefish)

(c)inaddy@cosmic:~$ ss -tn state established "( src [172.16.17.2] || src [172.16.17.3] )" "( sport == :2049 )"
ss: bison bellows (while parsing filter): "syntax error!" Sorry.
Usage: ss [ OPTIONS ]
       ss [ OPTIONS ] [ FILTER ]
   -h, --help          this message

# Ubuntu 19.04 (Disco Dingo)

(c)inaddy@disco:~$ ss -tn state established "( src [172.16.17.2] || src [172.16.17.3] )" "( sport == :2049 )"
ss: bison bellows (while parsing filter): "syntax error!" Sorry.
Usage: ss [ OPTIONS ]
       ss [ OPTIONS ] [ FILTER ]

# Ubuntu 19.10 (Eoan Ermine)

(c)inaddy@eoan:~$ ss -tn state established "( src [172.16.17.2] || src [172.16.17.3] )" "( sport == :2049 )"
ss: bison bellows (while parsing filter): "syntax error!" Sorry.
Usage: ss [ OPTIONS ]
       ss [ OPTIONS ] [ FILTER ]

----

Let me go check what happened in between Bionic and Cosmic and I'll get back to you.
Comment 7 Rafael David Tinoco 2019-06-05 17:04:32 UTC
Martin,

https://bugs.launchpad.net/ubuntu/+source/iproute2/+bug/1831775/comments/3

Found the fix, will fix Cosmic and Disco. I have asked iproute2 maintainer to merge to latest in Eoan. All set on our side, sorry for the burden here, this bug was a tricky one to track (thought it was bashism, didn't realize difference in between Debian and Ubuntu when preparing the packages, etc).

Thanks a lot for your support! Feel free to close this. Will reply to mailing list thread pointing out this comment.

Cheers o/
Comment 8 Martin Schwenke 2019-06-05 20:43:49 UTC
Hi Rafael,

Thanks for looking into that.

So, was the problem that iproute2 upstream commit b2038cc0b2403e8c5126cfcf45f6ee48ac549ad0 introduced a regression and commit 38d209ecf2ae966b9b25de4acb60cdffb0e06ced fixed it?

Thanks...
Comment 9 Rafael David Tinoco 2019-06-06 01:17:30 UTC
On 05/06/2019 17:43, samba-bugs@samba.org wrote:
> https://bugzilla.samba.org/show_bug.cgi?id=13985
>
> Martin Schwenke <martins@samba.org> changed:
>
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>          Resolution|INVALID                     |FIXED
>
> --- Comment #8 from Martin Schwenke <martins@samba.org> ---
> Hi Rafael,
>
> Thanks for looking into that.
>
> So, was the problem that iproute2 upstream commit
> b2038cc0b2403e8c5126cfcf45f6ee48ac549ad0 introduced a regression and commit
> 38d209ecf2ae966b9b25de4acb60cdffb0e06ced fixed it?

$ git tag --contains b2038cc0b2403e8c5126cfcf45f6ee48ac549ad0
v4.16.0
v4.17.0
v4.18.0
v4.19.0
v4.20.0
v5.0.0
v5.1.0

I believe so... and based on Ubuntu iproute2 versions:

iproute2 3.12.0-2 trusty
iproute2 3.12.0-2ubuntu1.2 trusty-updates
iproute2 4.3.0-1ubuntu3 xenial
iproute2 4.3.0-1ubuntu3.16.04.5 xenial-updates
iproute2 4.15.0-2ubuntu1 bionic
iproute2 4.18.0-1ubuntu2~ubuntu18.04.1 bionic-backports
iproute2 4.18.0-1ubuntu2 cosmic
iproute2 4.18.0-1ubuntu2 disco
iproute2 4.18.0-1ubuntu2 eoan

it also makes sense, since it got broken in cosmic and beyond.

We already scheduled a merge to 4.20.0-2 (Sid), based on:

$ git tag --contains 38d209ecf2ae966b9b25de4acb60cdffb0e06ced
v4.19.0
v4.20.0
v5.0.0
v5.1.0

For the Ubuntu SRUs (Stable Release Updates):

The bug tracking this issue is:

https://bugs.launchpad.net/ubuntu/+source/samba/+bug/722201

> Thanks...
Thank u!
Comment 10 Martin Schwenke 2019-06-06 12:36:51 UTC
Excellent!  I'm glad we both managed to understand the problem!  :-)