Hi all,
being part of Commons Conservancy brought up yet another subject,
which is whether we should add a header with license information in
every file in the projects under idpy. This is not something done in
an abstract way, there is a specific format modelling this information
(see https://spdx.org/ and https://reuse.software/ - more specifically
https://reuse.software/practices/2.0/) Still, I find it problematic.
We want to open up the question to the wider community and consider
their thoughts on this. The forwarded message below is discussing this
subject. You can see the question we posed, the answer we got and my
comments. Feel free to tell us what you think on this.
---------- Forwarded message ---------
Date: Thu, 16 May 2019 at 09:56
> ---------- Forwarded message ----------
> Date: May 8, 2019, 8:15 AM -0700
>
> > Why does CC think having a single license file per project is
> > insufficient? Our thought is that if we can avoid adding a header to
> > every single file, that would be nice, esp. given we already have this
> > info in the license file and we have the Note Well.
>
>
> this is not just our opinion, but something that is an industry and
> community standard for legal compliance these days. When companies like
> Siemens, Samsung or Honeywell use some code in one of the hundreds or
> thousands of devices and systems in their product line, they need to be
> able to provide the correct license and a download of the exact version.
> This means machine readability too.
>
I've actually observed the opposite of that. Communities abandon the
"license in every file" model, and just use a single LICENSE file in
the root of the project. The LICENSE file contains license
information, that is, it is not a single license but it has exception
sections and so on.
> To quote from https://reuse.software/practices/2.0/ :
>
> Scroll to the section "2. Include a copyright notice and license in each
> file"...
>
> "Source code files are often reused across multiple projects, taken from
> their origin and repurposed, or otherwise end up in repositories where
> they are separate from its origin. You should therefore ensure that all
> files in your project have a comment header that convey that file’s
> copyright and license information: Who are the copyright holders and
> under which license(s) do they release the file?
>
Continuing from above, the standardization of package-management
formats and tools has helped exactly with that: to avoid distribution
of single files, and instead provide packages and modules. It is bad
practice and considered a hack to copy files. Nobody liked that model
and everyone is moving away; it is unstructured, it becomes
unmanageable and it will cause problems.
> It is highly recommended that you keep the format of these headers
> consistent across your files. It is important, however, that you do not
> remove any information from headers in files of which you are not the
> sole author.
>
> You must convey the license information of your source code file in a
> standardised way, so that computers can interpret it. You can do this
> with an SPDX-License-Identifier tag followed by an SPDX expression
> defined by the SPDX specifications."
>
> (the text goes on for a while after this, to clarify the point but this
> is the basic gist of it)
>
> There is a nice Python tool to check:
>
> https://github.com/fsfe/reuse-tool
>
> I hope this makes sense
>
Well, it does not make complete sense. We're talking about licensing a
project. A project is not just code; there are data files (html, xml,
yaml, json files), binary files (archives/zip, images, audio, video,
etc), text files (configs, ini-files, etc) all "not-code". How do you
mark those files? Does the LICENSE file need a license-header? The
json format does not define comments, how do you add a header there?
If a binary file does not get a license header, why should a file with
code get one?
I would expect there to be a way to have the needed information
unified. If the files themselves cannot provide this information it
has to be external; thus the LICENSE file. If someone is worried about
somebody else re-using single files that do not have license
information (a python file, a png image, etc) there is really nothing
you can do (the DRM industry has been trying to solve for a long time;
and still your best bet is "social DRM").
Since, we're developing on open source with a permissive license, even
if someone does that, should we be happy that someone is actually
using what we built or sad that the files they copied did not have a
license header? And if they include the license information of that
copied file in their project's LICENSE file, is this solved?
Having pointed these contradictions, I am thinking that the "license
in every file" model seems to be a step backwards. It is introducing
overhead and does not really solve the problem, while at the same time
it enables a culture of bad practice (copying files around).
Cheers,
--
Ivan c00kiemon5ter Kanakarakis >:3
Hello everyone,
there has been a report on incident-response at idpy.org about a security
issue in PySaml2.
Alexey Sintsov and Yuri Goltsev from HERE Technologies reached out and
reported a XML Signature Wrapping (XSW) vulnerability. The issue
affects responses with signed assertions. PySaml2 can be tricked to
think that an assertion had been signed and use the assertion
information, when in reality the Signature points to another part of
the xml document that is controlled by another party.
The issue was assigned CVE-2020-5390 and is now fixed in the latest
pysaml2 release.
The relevant code commit that fixes is the issue:
https://github.com/IdentityPython/pysaml2/commit/5e9d5acbcd8ae45c4e736ac521…
Release v5.0.0 contains more changes, including:
- Add freshness period feature for MetaDataMDX
- Fix ipv6 validation to accommodate for addresses with brackets
- Fix xmlsec temporary files deletions
- Add method to get supported algorithms from metadata
- Add mdstore method to extract assurance certifications
- Add mdstore method to extract contact_person data
- Start dropping python2 support
Pointers to the release with changelog and more information, below:
- the relevant release commit:
https://github.com/IdentityPython/pysaml2/commit/f27c7e7a7010f83380566a219f…
- the github release:
https://github.com/IdentityPython/pysaml2/releases/tag/v5.0.0
- the pypi package:
https://pypi.org/project/pysaml2/5.0.0/
+ + + + + + + +
In more detail, regarding the XSW vulnerability:
libxml2 follows the xmldsig-core specification. The xmldsig
specification is way too
general. saml-core reuses the xmldsig specification, but constrains it to use of
specific facilities. The implementation of the SAML specification is
responsible to
enforce those constraints. libxml2/xmlsec1 are not aware of those
constraints and thus
process the document based on the full/general xmldsig rules.
What is happening is the following:
- xmldsig-core allows the signature-information and the data that was
signed to be in
different places. This works by setting the URI attribute of the
Reference element.
The URI attribute contains an optional identifier of the object
being signed. (see
"4.4.3 The Reference Element" --
https://www.w3.org/TR/xmldsig-core1/#sec-Reference)
This identifier is actually a pointer that can be defined in many
different ways; from
XPath expressions that need to be executed(!), to a full URL that
should be fetched(!)
in order to recalculate the signature.
- saml-core section "5.4 XML Signature Profile" defines constrains on
the xmldsig-core
facilities. It explicitly dictates that enveloped signatures are the
only signatures
allowed. This mean that:
* Assertion/RequestType/ResponseType elements must have an ID attribute
* signatures must have a single Reference element
* the Reference element must have a URI attribute
* the URI attribute contains an anchor
* the anchor points to the enclosing element's ID attribute
xmlsec1 does the right thing - it follows the reference URI pointer
and validates the
assertion. But, the pointer points to an assertion in another part of
the document; not
the assertion in which the signature is embedded/enveloped. SAML
processing thinks that
the signature is fine (that's what xmlsec1 said), and gets the
assertion data from the
assertion that contains the signature - but that assertion was never
validated. The
issue is that pysaml2 does not enforce the constrains on the signature
validation
facilities of xmldsig-core, that the saml-core spec defines.
The solution is simple; all we need is to make sure that assertions
with signatures (1)
contain one reference element that (2) has a URI attribute (3) that is
an anchor that
(4) points to the assertion in which the signature is embedded. If
those conditions are
met then we're good, otherwise we should fail the verification.
--
Ivan c00kiemon5ter Kanakarakis >:3
Hi Leif,
I added 2 pipes to buildin.py:
- publish_html creates static HTML views of IDPs and SPs, using XSLT based on Peter Schober’s alternative to MET;
- publish_split: similar to store, but added validUntil and creates signed XML-file per EntityDescriptor. This can be consumed dynamically by ADFS in an IDP role.
I put it directly into buildin.py because it shares some code with the sign pipe. Is this viable from your PoV - if yes, I would make an PR.
Cheers, Rainer
*Idpy meeting 27 January 2025*
Attendees: Johan L, Shayna, Ivan, Mikael, Matthew, Enrique, Hannah
0 - Agenda bash
1 - Project review
a. General -
- Documentation pain points run-through / mock saml workflow / mock oidc
workflow
- Matthew has a github repo that mocks up a SAML authentication flow
using pytest. Pysaml is needed for a client to write more
than one identity
provider, and a service provider to test those identity providers -
https://github.com/xenophonf/mock-saml-flow
- Has a configuration for IdP and SP from his Satosa
configuration.
- There is some documentation for the configuration, but one
challenge is the configuration is entirely a mapping. There's
no typing
hints, things aren't discoverable in his development environment.
- Doesn't want to get into Flask, just wants to concentrate on
using what paysaml2 calls a Server and a Client. However, there is no
documentation on the classes and methods to use.
- No API documentation
- the simple examples included in code base just tell you how to
run it; not how to use the code
- one option is to go to the example code and try to figure out
how to develop an SP or IdP, but reading the source code is
really hard.
- what is repoze?
- it's not clear where to start. First guess is to start in idp_
uwsgi.py, skimming for things like url routes (where do
authentication requests come in/go out?). Only able to
find a SSO class.
How is this invoked?
- Conclusion: it takes too long to figure out how to find the
basics of what you need to know.
- Another option - test suite. Not successful here either.
- Next option - reading through Satosa code. Found Server class.
There are no type annotations, no document string for the
method to create.
There is some document strings for the other functions, some
params, some
information, but not enough to know what to use when. Found the
create_authn_response() method but there is no information on
what should
be provided for a good saml response or what structure it
should follow -
the only way to get this information is trial and error.
- idpy-oidc documentation looked more promising
- Encouraged by newer style for the documentation page.
- Wanted to write a small RP - looked at client documentation
- There is a workflow, describing the process and a high level
overview of how oidc RPs work. Very helpful.
- API documentation is lacking. No method signatures. For
example, is issuer_id the only argument the begin() method
takes? But the
Tier1 API design is good.
- No examples on how to instantiate the IdP
- Left reading through the source code again, trying to intuit
from example code and test code how to do it.
- When looking to write code to have a test OP (to test RP) -
Server code - the documentation only tells you how the
configuration
directives work. No API calls, no example code, no examples of
instantiation.
- SQLAlchemy ORM is an example of great documentation on how to
get going - type annotations, explaining what each parameter
is for, etc.
- For mock saml - took 2 months to write 179 lines of code
- Kushal has tried to document some examples -
https://kushaldas.in/learningsaml - unfortunately this starts from
a place beyond where Matthew's knowledge is
- Ivan is the person Matthew and Hannah will need to talk to to
get their project going. First focus on SAML backend for Service, SAML
Frontend for Idp. This won't give the complete picture but
it's a start.
- For Service:
https://github.com/IdentityPython/SATOSA/blob/master/src/satosa/backends/sa…,
especially authn_request and authn_response
- For an IDP:
https://github.com/IdentityPython/SATOSA/blob/master/src/satosa/frontends/s…,
esp handle_authn_request and _handle_authn_response
- Ivan acknowledges there is not developer documentation
- initial step to address docstrings
- Need to define public API methods and documentation - this
would 90-95% of what you can do with SAML
- The examples are old, and sometimes create confusion (for
example, the Server in pysaml2 is an IdP, the Server in
examples is a wsgi
server, the Server in Satosa is a proxy server).
- A referenced example against a SAML trace would be very
helpful
b. OIDC libraries - https://github.com/IdentityPython (idpy-oidc,
JWTConnect-Python-CryptoJWT, etc)
c. Satosa - https://github.com/IdentityPython/SATOSA
- LDAP plugin release coming
- Matthew will have a new SATOSA Docker image out new week
d. pySAML2 - https://github.com/IdentityPython/pysaml2
- release coming to address xml enc changes and introducing types around
entity categories from Frederik and Johan.
e. Any other project (pyFF, djangosaml2, pyMDOC-CBOR, etc)
- Roland's code has been moved under Sunet - mostly related to openid
federation and credential/wallet. Let's have a discussion whether some of
these need to be moved under IdPy
- SUNET/openid4v
- SUNET/satosa-idpy
- SUNET/satosa-openid4vci
- SUNET/fedservice
- SUNET/idpy-sdjwt
2 - AOB
3 - Action items
- pull out questions Matthew raised one by one and document them
- For example some classes were machine generated based on schema
changes (Matthew suspected this) - perhaps we should add docstrings to
explain how/when this was don
Attendees: Johan L, Shayna, Ivan, Mikael, Hannah, Matthew
0 - Agenda bash
1 - Project review
a. General -
- Find a new time slot for a weekly meeting - Mondays 12-13 UTC
- Find a time slot for a review of "documentation pain points"
meeting with Matthew E -we will take time from the 27 January meeting
- Ivan will send a message to the board, to discuss transitioning to
a different model (running services)
b. OIDC libraries - https://github.com/IdentityPython (idpy-oidc,
JWTConnect-Python-CryptoJWT, etc)
- Roland - discussion of PRs in stages vs one big one - Ivan will reach
out to him again. Mikael reports there is a workshop going on
about handing
over the keys to the kingdom.
- there will be a PR coming regarding audience policies
- before the open id connect frontend/library can release a new
token, and include an audience as a claim in the token, this
audience can
be setup with certain rules. There is a specification in the
RFC that the
client can request certain audiences can be included in the
token, in the
audience claim, using the resource indicator.
- How the OP will decide if they will do this or not is more
complicated. The OP will process the request and will see the
client wants
the audiences in the auth claim, so it applies a filter. But
what if it
doesn't? How does it signal that it didn't respect what the client
requested? No matter what, ultimately the OP will decide
which audiences go
into the auth claim.
- But what if the policy changes over time? The OP may be reloaded
with a new policy. The client will use a token or give one to
an RS, and we
need to take into consideration that the policy may have changed when
replying from the different endpoints where the audience can have
participation.
- What happens if we can't refresh our access token because a
policy changed? What happens if you are allowed to do more
things than were
requested? The potential change over time spans is the tricky
part, and may
cause the OP to deny a request.
- Need to introduce a hook in the right places where the audience
will be set/returned/displayed to apply the policy needed.
- Nikos has gone through the PR Roland opened - looks good and was
tested - will be merged.
c. Satosa - https://github.com/IdentityPython/SATOSA
- Ivan (speaking for both SATOSA and pysaml2) will work on the next
stages of release, incorporating different categories of PRs -
he has some
changes locally but hasn't pushed them yet
- username, ldap plugins
- Johan's request for typings
- support xml encoding version 1.1 (?)
d. pySAML2 - https://github.com/IdentityPython/pysaml2
- Making sure compatible with python 3.13 - should do this with all
code. Has been addressed for pyff (see below)
e. Any other project (pyFF, djangosaml2, pyMDOC-CBOR, etc)
- Conversations with Enrique around pyff - need discussion with spec
authors around trustinfo elements- Matthew Stewart will ask the working
group for their help to expand on the specification and
capabilities. There
are some questions about processing the list of preferred/exclusively
trusted identity providers, done through the services metadata or another
structure - a JSON payload with a list of sets of IdPs. During the
processing of the list, it is possible that what has already
been processed
is getting overwritten. It shouldn't happen that the same list should be
entered twice. How should this be handled? Should it be up to the
implementers; should we throw an error or a warning? Need to
determine how
to handle these edge cases. Should this invalidate the service
such that we
don't continue to process its metadata? Should we skip the
repeated entity,
use the last entity, etc?
- Mikael added a github action to make sure they can build pyff with
latest python release. This should probably be done for all code.
2 - AOB
- Change meeting invite and ensure Enrique and Alex are added
- Matthew is going to post his mock SAML workflow writeup to the
idpy-discuss list. He is doing a similar writeup for the idpy-oidc
libraries. It is a narrow example of the authentication flow,
mocking up an
OP and RP, simulating a request and response and processing the response.
He may be able to share some of that with us at the next meeting. It is
using pytest but not simulating things like a web browser.