Hi Ivan,
Thanks for the elaborate investigation and report!
As far as I understand (which may be the limiting factor here) hashing
takes place twice, one time before and one time after the microservices
have been called. And then in addition is it possible to manipulate an
identifier on which the hash is based inside a microservice, which leads
to possible discrepancies, especially if an attribute (ePTID) and the
saml nameID need to be the same. I agree fully that we need to change
this behavior especially as hashing 2 times is weird and unneeded. I am
wondering though if just removing is the best solution?
There are a few scenarios possible here:
- we just need a hash over some identifier we received from and Idp/OP
and use that to build a response to an SP/RP. Typically e.g. when you go
from or between SAML / OIDC or vise versa, without too much magic from
the microservices. In this case it makes sense to let an identifier hash
be generated just before constructing the response, I think
- we need an identifier based on the result of a microservice. In this
case again it makes sense to let an identifier hash be generated just
before constructing the response
- we need the ability to align the value of an attribute with the value
of the nameid
- we may have special cases where we create both the identifier as well
as the hash inside the microservice and do not want satosa to touch it
anymore.
- we may have a case where we want to just pass on what we got from the
Idp/OP
Based on the above I am wondering if we should not remove hash
generation for SaToSa core entirely, and make it a (default on) micro
service so the behaviour may be controlled by configuration in
proxy-config.yaml
Cheers,
Niels
On 17-09-18 23:42, Ivan Kanakarakis wrote:
Hi everyone,
I proposed to discuss this at the bi-weekly idpy call (Tue 18/9 at
13:00 UTC). This is a bit complex, so I thought I'd do a writeup for
anyone that would like to discuss this.
I will try to describe how SATOSA handles the NameID value, and by
this I mean how the text value of the NameID node is transformed,
after it is received by the saml-backend (from an IdP), and before it
is passed over to the appropriate (saml or openid) frontend.
Omitting some details the flow is as follows: Once an authn-response
is received, the backend parses the response and extracts the NameID
node as `name_id` and its text value as `user_id`[0]. Control flow is
returned to the `base` module where `used_id` is hashed[1] before it
is passed to the response-microservices. Once microservices are done
doing their job, control is back in `base` module and the `user_id` is
hashed again[2]. Finally, the appropriate frontend module (saml[3],
openid[4]) is invoked and `user_id` is used as the basis for the
attribute that is needed.
What stands out here is the hashing of `user_id` (that happens twice).
This is something that has come up multiple times in different forms
(pull-requests #137 #160 #193 work around it). I decided to figure out
what's going on: How is that decision formed? Why does that take
place? Does it provide any value?
Hashing of `user_id` was introduced by commit b54006f [5].
Unfortunately, the accompanying commit message does not offer an
explanation or any hints on why this was needed. Looking at the
evolution of the code, nothing shows me why that was needed or what
value it added. A hint is that it is closely coupled to the
`UserIdHashType` enumeration[6] from `internal_data` module.
Unfortunately, the evolution of that part of the code also doesn't
help to reveal the need for both the enumeration and the hash.
In the hope that maybe a test that I did not see at the time could
reveal a reason for this behaviour, I went ahead and removed the
relevant code. You can see the changeset on my personal fork on the
branch named 'refactor-remove-user-id-hash' [7]. And nothing really
broke! At this point I do not see any value keeping the UserIdHashType
enumeration and hashing of `user_id`.
If someone has an insight on why hashing `user_id` was introduced and
what its purpose was, I would be interested to know. If we do not come
up with a valid use case, I would propose to move forward by removing
that part of the code.
[0]:
https://github.com/IdentityPython/SATOSA/blob/d861e87/src/satosa/backends/s…
[1]:
https://github.com/IdentityPython/SATOSA/blob/d861e87/src/satosa/base.py#L1…
[2]:
https://github.com/IdentityPython/SATOSA/blob/d861e87/src/satosa/base.py#L1…
[3]:
https://github.com/IdentityPython/SATOSA/blob/d861e87/src/satosa/frontends/…
[4]:
https://github.com/IdentityPython/SATOSA/blob/d861e87/src/satosa/frontends/…
[5]:
https://github.com/IdentityPython/SATOSA/commit/b54006f
[6]:
https://github.com/IdentityPython/SATOSA/blob/d861e87/src/satosa/internal_d…
[7]:
https://github.com/c00kiemon5ter/SATOSA/compare/d861e87...refactor-remove-u…
Cheers,
--
Niels van Dijk Technical Product Manager Trust & Security
Mob: +31 651347657 | Skype: cdr-80 | PGP Key ID: 0xDE7BB2F5
SURFnet BV | PO.Box 19035 | NL-3501 DA Utrecht | The Netherlands
www.surfnet.nl www.openconext.org