Attendees: Johan W, Shayna, Enrique
Short meeting:
- pyFF - as to the recent discussions on pyFF issue 289
<https://github.com/IdentityPython/pyFF/issues/289>:
- in principle they will do the deduplication in thiss-mdq - one
discojson for each source and then merge those sources
- some of the changes that were merged to pyFF will need to be
revisited and possibly removed. The select pipe approach will not work
because entities have already been overwritten.
Attendees: Shayna, Mikael, Matthew, Alex Stuart, Enrique
Thank you to Enrique for providing a synopsis of some of the discussion of
PyFF issue 289, below.
0 - Agenda bash
1 - Project review
a. General -
b. OIDC libraries - https://github.com/IdentityPython (idpy-oidc,
JWTConnect-Python-CryptoJWT, etc)
c. Satosa - https://github.com/IdentityPython/SATOSA
- Matthew is working on the container refresh - trying to get default
configuration right. He is trying to decide on what to put in place of
SAMLtest.id.
d. pySAML2 - https://github.com/IdentityPython/pysaml2
e. Any other project (pyFF, djangosaml2, pyMDOC-CBOR, etc)
- There was much discussion on pyFF issue 289
<https://github.com/IdentityPython/pyFF/issues/289>
- There are two issues: distinguishing between name collisions and
the same entity in two different federations. It should not be hard to
distinguish name collisions with heuristics or distance measurements.
Confidence in the same entity in two different federations ??
They may have
the same title or same description.
- Take subsets of all the entities that the MDQ might know about
and create merged data with a merging strategy. It would be
good to make
that data available, but make it clear that it is merged.
- Mikael states they use pyFF as an authoritative source. If they
select which entity is published then they will be making
decisions for
their users. He would like a clearer policy on how the merge
is done. Right
now, the "replace with the latest-read entity id" rule is
hidden in the
code. There are provisions in the code for how to merge when there are
conflicts. We could invent a way to do the merge. Enrique
would like to
look into this. Mikael states that whenever pyFF is run as a
daemon and the
update endpoint is called, the provisions are applied. Mikael
is not sure
how other people are using pyFF - eduGAIN uses pyFF - how do they do
aggregation? Alex says there is an algorithm using a
precedence rule, but
he is not sure if that is using pyFF functionality or if
there is something
additional written in Python whcih makes that selection.
- Mikael would like a cleaner way to define how auto merging is
done, and Enrique would like to have all the data available.
Maybe in the
select pipeline, there is some way to configure/indicate
using different
lists. The aggregate is like a hard rule/limit, and the
discovery service
is more a convenience approach. But If the entity is not part of the
discovery flow, it won't be used at all.
- Here are Enrique's notes summarizing the discussion, with a
potential solution via Alex. Thank you, Enrique! :
- There is a need to provide SeamlessAccess with filtering
capabilities based on entity data that is merged from
different sources.
For example, we might want to filter IdP data based on registration
authority, which will be different in each metadata
source; or based on
entity categories, that may or may not differ in different sources.
- We have been discussing in pyFF's issue #289 the possibility
of performing this merge in pyFF. However we are reaching
the consensus
that if we take this responsibility in pyFF, we would need
to account for
all possible usecases that would benefit from this merged
data, and not
just SeamlessAccess' usecase. This will require more
extensive research and
discussion among more interested parties; and much care egarding
communication and documentation.
- So one possibility to move forward with this, suggested by a
comment from Alex Stuart regarding the UK federation,
would be to move the
responsibility of merging data to the downstream of pyFF
in SeamlesAccess,
i.e., to thiss-mdq. This would entail using pyFF toproduce
one discojson
output for each of the metadata sources aggregated by
SeamlessAccess, and
prepare thiss-mdq to consume all these outputs, merging
them in whatever
way it seems convenient for SeamlessAccess.
- We will discuss this within SeamlessAccess, and we'll be back
with whatever we conclude.
- Matthew asked where the pyFF documentation is? Mikael says
there are documents exported to readthedocs, and an old
version of inline
documentation in the code that is not picked up by modern
tools. Mikael is
trying to improve the documentation and test without breaking anything.
- Matthew wanted to use pyfFF for a private thiss.js deployment
but couldn't quite get it working. For the pipelines, it uses a domain
specific language that is not easy to work with. We are stuck
with that
right now unless there is a new major version. Without a
working example
its hard to get something going - but there are examples in
the unit tests.
- instead of pyFF, the UK federation uses the shibboleth MDA
framework. EduGAIN uses both pyFF and shibboleth MDA. It's a
complicated
framework to set up. The spring config files are XML-based.
In both cases,
the SAML pipelines refresh.
- Mikael is still working on upstreaming the OpenID Federation
code bases / Wallet repositories
2 - AOB
Hi all,
This is what I remember from last Monday's meeting. It was Mikael, Ivan
and me, please both of you correct me on anything I misremember or
forget.
There was talk about pyFF's issue 291 [1] (Broken handling of # in
filenames and urls), Mikael said he was working on it - (or already had
a PR fixing it?)
Then Ivan told us he had released a couple of pysaml2 versions, and was
working on a further couple of PRs, but I forget the details, perhaps
you @Ivan can fill them in.
Finally there was some discussion on pyFF's issue 289 [2] (When an
entity is loaded from 2 sources, entity data from the 1st source is
lost). I started by showing some "proof of concept" code (just around 15
lines, see [3]) that addresses this issue. This is just a POC since:
* The issue is only addressed in MemoryStore, it would probably also
need to be addressed in RedisWooshStore.
* The data is duplicated. In the current version, when entity data is
loaded from the sources, it is kept in 2 structures in the store:
`md`, which is a dictionary of md sources to lists of entityID's, and
`entities`, which is a dictionary of entityID's to entity data. In the
POC code, we add `md_entities`, which is a dictionary of md sources to
dictionaries of entityID's to entity data.
In the POC, the first 2 structures are still used for all the purposes
that they have ever been used for, and the new `md_entities` structure
is only used when the select pipe is configured with the new option
`dedup False` (the option defaults to `dedup True`).
But of course `md_entities` contains all the data that is in `md` and
`entities`, so we might think of removing the latter and using the
former for the purposes that the latter are used.
A few concerns with this issue were discussed:
* How do consumers (MDQ service) deal when they have duplicates in the
metadata and are asked for some particular entity? In thiss-mdq, when
the metadata is loaded, entities are deduplicated, but some number of
(multivalued) entity attributes are merged.
* What happens when one entityID present in 2 md sources correspond to
different entities (name collisions)? This is a difficult problem, but
somewhat orthogonal to the issue, since in the current pyFF form, it
is also present (currently, one of the entities would just dissapear).
* Can this be abused, if federation A has less strict requirements for
some entity attribute than federation B? Yes possibly, this would need
some risk assessment by the working group; some of the metadata would
not be affected, for example registrationAuthority.
In the end, we agreed that more discussion is needed to reach a
difinitive conclussion, and that any solution is going to carry problems
that can at most only be mitigated but not fully solved.
Best regards,
1.- https://github.com/IdentityPython/pyFF/issues/291
2.- https://github.com/IdentityPython/pyFF/issues/289
3.- https://github.com/enriquepablo/pyFF/commit/0fb326d6043c1a3c6c2bb9a431cf4a9…
--
Enrique Pérez Arnaud
Hi
I have produced a minimal POC that would allow pyFF to load metadata
from different sources and produce a list of entities in which there may
be more than one entity with the same entityID (one for each source in
which it was present). It can be tested with the attachments in the
issue tracker.
1.- https://github.com/enriquepablo/pyFF/commit/0fb326d6043c1a3c6c2bb9a431cf4a9…
Best regards,
--
Enrique Pérez Arnaud
Attendees: Johan L, Shayna, Ivan, Mikael, Matthew, Enrique, Hannah
0 - Agenda bash
1 - Project review
a. General
- Moving project repos - all should be moving under IdPy, but who will
maintain the ones that are new and not going to be under other projects?
- Mikael will keep them floating - most (other than the ones being
added to Satosa) are considered POC, and Sunet and SWAMID will
use them as
reference.
- Mikael will look into the process for bringing the repos under the
IdPy umbrella, described here:
https://github.com/IdentityPython/Governance/blob/master/idpy-projects.md
b. OIDC libraries - https://github.com/IdentityPython (idpy-oidc,
JWTConnect-Python-CryptoJWT, etc)
- Nikos will be putting up PRs with some new functionality.
- Everything should be under Roland's branch for the new repos
c. Satosa - https://github.com/IdentityPython/SATOSA
- Will be posting a new release after the call with the
ldap_attribute_store plugin updates.
d. pySAML2 - https://github.com/IdentityPython/pysaml2
- Will be creating a new release after the call to include:
- https://github.com/IdentityPython/pysaml2/pull/964
- https://github.com/IdentityPython/pysaml2/pull/897
- uses pydantic v1 but now we have pydantic v2, so want to make
sure there are no problems - there may be an issue with
the python version.
Ivan is testing with 3.13. Mikael knows there is a
breakage with pyFF with
3.13 that he thought might be related to pydantic.
- Next will look at some changes Giuseppe has prepared and is
using in his fork around namespace names.
- https://github.com/IdentityPython/pysaml2/pull/625
e. Any other project (pyFF, djangosaml2, pyMDOC-CBOR, etc)
- pyFF: Mikael will be taking a look at the hashmark issue mentioned in
the last meeting. Ivan is looking into this as well.
- Mikael and Enrique are collaborating on the issue Enrique described
last week.
2 - AOB
- Matthew had posted some things on Slack about the attribute mapper,
but was able to figure out what he needed.
- SAML defines attributes - they are not just an identifier. There is
the name, the friendly name, and the name format. The name
format tells you
how the name is structured - it is not really a string. It could be a url
or uri , for example. Within the name you could have a uri with a hash
symbol with a pointer, so you cannot just compare the values as strings.
Parsing the objects the right way may show they are the same. The
uniqueness of an attribute does not come from the name - you have to
combine it with the name format.
- Ivan will try to answer this on Slack and give some examples
- Matthew is currently working on signing outgoing SAML requests - it
is not working out of the box. He will gather his questions on this for
another time.
- Matthew is also working on how to structure tests for an application
that uses SAML, and uses jwts after the SAML response. Would like to mock
up a real world application.
- Next goal is to be able to do integration testing, deploying an IdP
that facilitates that.
- Also doing all the same stuff with open id connect. Still working
on getting the proper configuration.
- Next week, Shayna will be out and Matthew has volunteered to take
notes.
Attendees: Johan L, Shayna, Ivan, Mikael, Matthew, Enrique, Hannah, Alex
0 - Agenda bash
1 - Project review
a. General
- Discuss whether we should move some of Roland's project repos (which
have been moved under Sunet) to be under IdPy:
- SUNET/openid4v - ??
- SUNET/satosa-idpy - frontend - becomes part of SATOSA
- SUNET/satosa-openid4vci - extension to proxy - becomes MR under
SATOSA
- SUNET/fedservice - under idpy
- SUNET/idpy-sdjwt - under idpy
- Need time to understand - should probably slack Roland about
what changes they are introducing
- Also there is a document that explains the process for adopting the
code under IdPy which Ivan was going to share.
b. OIDC libraries - https://github.com/IdentityPython (idpy-oidc,
JWTConnect-Python-CryptoJWT, etc)
- New MR(s) with Nikos introducing changes around resource indicators/
audience policies. The internal fork is also being synced with the
upstream.
c. Satosa - https://github.com/IdentityPython/SATOSA
- trying to get to release for LDAP plugin changes
d. pySAML2 - https://github.com/IdentityPython/pysaml2
- Want to make sure latest releases will work together (both Satosa and
pySAML2)
- trying to get to release for typing changes for the entity
categories
e. Any other project (pyFF, djangosaml2, pyMDOC-CBOR, etc)
- Enrique mentioned in issue in pyFF, where entities in different
federations have the same entity id. One possibility is that it
may be the
same entity registered to different federations, but also it could be two
different entities in different federations but they have the same entity
id. Enrique has the use case in Seamless Access, where entities are
registered in two different federations, in different entity
categories. He
needs to be able to merge certain entity attributes in this case.
- The issue and Enrique's notes on it can be found here:
https://github.com/IdentityPython/pyFF/issues/289
- The way things are now, the entity in the first federation would
be overwritten by the one in the second federation.
- We may want to merge different entity categories, which may
differ from one federation to another.
- For filtering in mdq, we would need the attributes, which could
have different values in different federations, to be merged.
- Enrique's PR was merged to allow select pipe to be able to
produce a list of entities with duplicates. But the
overwriting happens
before the select pipe - it happens in the load pipe - all
the entities
from all the sources are squashed into one dictionary, keyed
by entity id.
- One idea would be to keep the entities keyed not only by entity
id but also by metadata service in the load pipe, but there
is some concern
that this could introduce problems. When the user requests an
entity id,
how would we know which one to choose? Could this be handled
in the MDQ
service? Would the rest of the attributes give a fairly good
heuristic to
distinguish between those cases? The select pipe can default
to doing it
the way things currently work - overwriting the first with
the second - or
use a heuristic downstream to either merge the duplicated
entities or keep
them separate since they are different. Mikael warns that we
need to be
careful not to change the default heuristic - otherwise we
might break all
federations. A new heuristic would need to be approved for
each aggregator.
We need to better define what happens when we get conflicts
so someone will
be notified, instead of just getting the last loaded. We will need a
feature flag so we don't break deployments.
- Ivan wants to establish if this a current problem, or if we are
trying to think ahead to a potential problem, especially in
the case where
the entity ids are the same in two different federations but
the actual
entities are different. We should document that we make the
assumption that
there will not be different entities in different federations
that will
have the same entity id. However, If they are the same entity but in
different federations, there is an issue with the entity
exposing different
things in each federation.
- Merging multi-value attributes: What if there is an entity with
an entity category in InCommon (say, "hide from discovery")
that is not
available when that same entity is in eduGain? If we merge
them, something
that was not intentional would happen. Need to examine these
things more
thoughtfully before doing merging - some are based on policy.
- PyFF - Microsoft EntraID causes a problem with hash marks in
entity ids. pyFF doesn't support getting it by url, only by sha1sum. pyFF
refuses to load a file when hash marks appear in it.
- pyMDOC-CBOR
- will create an MR around a request for an improvement
2 - AOB
- Welcome to Alex - he has been working with Ivan looking at the
proxy/Satosam, focusing on projects under GÉANT CoreAII (formerly
eduteams). He has a lot of experience with different languages and
libraries, and has a lot of ideas on how we can approach things and
structure things better.
- Need to create a list of questions generated from Matthew's efforts to
write his own IdP / test SP, then work through a few of them at a time
weekly.