Hi all,
being part of Commons Conservancy brought up yet another subject,
which is whether we should add a header with license information in
every file in the projects under idpy. This is not something done in
an abstract way, there is a specific format modelling this information
(see https://spdx.org/ and https://reuse.software/ - more specifically
https://reuse.software/practices/2.0/) Still, I find it problematic.
We want to open up the question to the wider community and consider
their thoughts on this. The forwarded message below is discussing this
subject. You can see the question we posed, the answer we got and my
comments. Feel free to tell us what you think on this.
---------- Forwarded message ---------
Date: Thu, 16 May 2019 at 09:56
> ---------- Forwarded message ----------
> Date: May 8, 2019, 8:15 AM -0700
>
> > Why does CC think having a single license file per project is
> > insufficient? Our thought is that if we can avoid adding a header to
> > every single file, that would be nice, esp. given we already have this
> > info in the license file and we have the Note Well.
>
>
> this is not just our opinion, but something that is an industry and
> community standard for legal compliance these days. When companies like
> Siemens, Samsung or Honeywell use some code in one of the hundreds or
> thousands of devices and systems in their product line, they need to be
> able to provide the correct license and a download of the exact version.
> This means machine readability too.
>
I've actually observed the opposite of that. Communities abandon the
"license in every file" model, and just use a single LICENSE file in
the root of the project. The LICENSE file contains license
information, that is, it is not a single license but it has exception
sections and so on.
> To quote from https://reuse.software/practices/2.0/ :
>
> Scroll to the section "2. Include a copyright notice and license in each
> file"...
>
> "Source code files are often reused across multiple projects, taken from
> their origin and repurposed, or otherwise end up in repositories where
> they are separate from its origin. You should therefore ensure that all
> files in your project have a comment header that convey that file’s
> copyright and license information: Who are the copyright holders and
> under which license(s) do they release the file?
>
Continuing from above, the standardization of package-management
formats and tools has helped exactly with that: to avoid distribution
of single files, and instead provide packages and modules. It is bad
practice and considered a hack to copy files. Nobody liked that model
and everyone is moving away; it is unstructured, it becomes
unmanageable and it will cause problems.
> It is highly recommended that you keep the format of these headers
> consistent across your files. It is important, however, that you do not
> remove any information from headers in files of which you are not the
> sole author.
>
> You must convey the license information of your source code file in a
> standardised way, so that computers can interpret it. You can do this
> with an SPDX-License-Identifier tag followed by an SPDX expression
> defined by the SPDX specifications."
>
> (the text goes on for a while after this, to clarify the point but this
> is the basic gist of it)
>
> There is a nice Python tool to check:
>
> https://github.com/fsfe/reuse-tool
>
> I hope this makes sense
>
Well, it does not make complete sense. We're talking about licensing a
project. A project is not just code; there are data files (html, xml,
yaml, json files), binary files (archives/zip, images, audio, video,
etc), text files (configs, ini-files, etc) all "not-code". How do you
mark those files? Does the LICENSE file need a license-header? The
json format does not define comments, how do you add a header there?
If a binary file does not get a license header, why should a file with
code get one?
I would expect there to be a way to have the needed information
unified. If the files themselves cannot provide this information it
has to be external; thus the LICENSE file. If someone is worried about
somebody else re-using single files that do not have license
information (a python file, a png image, etc) there is really nothing
you can do (the DRM industry has been trying to solve for a long time;
and still your best bet is "social DRM").
Since, we're developing on open source with a permissive license, even
if someone does that, should we be happy that someone is actually
using what we built or sad that the files they copied did not have a
license header? And if they include the license information of that
copied file in their project's LICENSE file, is this solved?
Having pointed these contradictions, I am thinking that the "license
in every file" model seems to be a step backwards. It is introducing
overhead and does not really solve the problem, while at the same time
it enables a culture of bad practice (copying files around).
Cheers,
--
Ivan c00kiemon5ter Kanakarakis >:3
Hello everyone,
there has been a report on incident-response at idpy.org about a security
issue in PySaml2.
Alexey Sintsov and Yuri Goltsev from HERE Technologies reached out and
reported a XML Signature Wrapping (XSW) vulnerability. The issue
affects responses with signed assertions. PySaml2 can be tricked to
think that an assertion had been signed and use the assertion
information, when in reality the Signature points to another part of
the xml document that is controlled by another party.
The issue was assigned CVE-2020-5390 and is now fixed in the latest
pysaml2 release.
The relevant code commit that fixes is the issue:
https://github.com/IdentityPython/pysaml2/commit/5e9d5acbcd8ae45c4e736ac521…
Release v5.0.0 contains more changes, including:
- Add freshness period feature for MetaDataMDX
- Fix ipv6 validation to accommodate for addresses with brackets
- Fix xmlsec temporary files deletions
- Add method to get supported algorithms from metadata
- Add mdstore method to extract assurance certifications
- Add mdstore method to extract contact_person data
- Start dropping python2 support
Pointers to the release with changelog and more information, below:
- the relevant release commit:
https://github.com/IdentityPython/pysaml2/commit/f27c7e7a7010f83380566a219f…
- the github release:
https://github.com/IdentityPython/pysaml2/releases/tag/v5.0.0
- the pypi package:
https://pypi.org/project/pysaml2/5.0.0/
+ + + + + + + +
In more detail, regarding the XSW vulnerability:
libxml2 follows the xmldsig-core specification. The xmldsig
specification is way too
general. saml-core reuses the xmldsig specification, but constrains it to use of
specific facilities. The implementation of the SAML specification is
responsible to
enforce those constraints. libxml2/xmlsec1 are not aware of those
constraints and thus
process the document based on the full/general xmldsig rules.
What is happening is the following:
- xmldsig-core allows the signature-information and the data that was
signed to be in
different places. This works by setting the URI attribute of the
Reference element.
The URI attribute contains an optional identifier of the object
being signed. (see
"4.4.3 The Reference Element" --
https://www.w3.org/TR/xmldsig-core1/#sec-Reference)
This identifier is actually a pointer that can be defined in many
different ways; from
XPath expressions that need to be executed(!), to a full URL that
should be fetched(!)
in order to recalculate the signature.
- saml-core section "5.4 XML Signature Profile" defines constrains on
the xmldsig-core
facilities. It explicitly dictates that enveloped signatures are the
only signatures
allowed. This mean that:
* Assertion/RequestType/ResponseType elements must have an ID attribute
* signatures must have a single Reference element
* the Reference element must have a URI attribute
* the URI attribute contains an anchor
* the anchor points to the enclosing element's ID attribute
xmlsec1 does the right thing - it follows the reference URI pointer
and validates the
assertion. But, the pointer points to an assertion in another part of
the document; not
the assertion in which the signature is embedded/enveloped. SAML
processing thinks that
the signature is fine (that's what xmlsec1 said), and gets the
assertion data from the
assertion that contains the signature - but that assertion was never
validated. The
issue is that pysaml2 does not enforce the constrains on the signature
validation
facilities of xmldsig-core, that the saml-core spec defines.
The solution is simple; all we need is to make sure that assertions
with signatures (1)
contain one reference element that (2) has a URI attribute (3) that is
an anchor that
(4) points to the assertion in which the signature is embedded. If
those conditions are
met then we're good, otherwise we should fail the verification.
--
Ivan c00kiemon5ter Kanakarakis >:3
Hi Leif,
I added 2 pipes to buildin.py:
- publish_html creates static HTML views of IDPs and SPs, using XSLT based on Peter Schober’s alternative to MET;
- publish_split: similar to store, but added validUntil and creates signed XML-file per EntityDescriptor. This can be consumed dynamically by ADFS in an IDP role.
I put it directly into buildin.py because it shares some code with the sign pipe. Is this viable from your PoV - if yes, I would make an PR.
Cheers, Rainer
Hi all,
As far as I understand, the 'sub' is the standard claim to release a
user identifier value when you use OpenID Connect frontend. But even
though the subject_id is properly set in the internal request, I can not
get the frontend to not change this value when it creates the id_token.
Is there a way to do this?
Or how do others release the user's identifier using OIDC? I can do it
using 'email' or a custom claim, but none of this seems to be The Right
Way.
However, I'm inclined to think that using 'sub' for the user identifier
is overly complex, because I want to be able to populate the subject_id
from a SAML2 NameID (on the backend side), which I can only do with a
response microservice, but I'm not confident that changing the
'subject_id' in a microservice is a safe and supported operation.
Thanks for any insight,
Kristof
Hi!
Having refactored the Session/Grant management I though I would just add some documentation
Famous last words…
We really, really need to rework the documentation.
This isn’t the first time that someone has said this.
Will probably not be the lats time either….
But let’s start small.
Could we at least agree on a layout, a set of chapters and in what order.
Maybe we could begin with deciding on who we are writing for.
Surely we will have readers that comes to us with different backgrounds/needs :
- An architect, who wants a birds eyes view over what the system can or can’t do
- A service provider, who wants a set of steps that will get a server running.
- An apps implementer who wants to know how to interface with the client API and use Google/Github/.. as the OP.
- A developer that wants to add another endpoint/service to the system.
There are probably more.
If we could agree on who we write for we could perhaps start there.
Maybe we should have a set of entry points depending on who you (the reader) are.
Something like the list above.
If I where connected to/employed by a big organisation I could imaging finding persons that fits the
profiles above and ask them what they want to know (the depth of knowledge they want) but I ain’t so
that’s not going to happen.
— Roland
Were it left to me to decide whether we should have a government without newspapers, or newspapers without a government, I should not hesitate a moment to prefer the latter. -Thomas Jefferson, third US president, architect, and author (1743-1826)
(And now for today’s notes)
Attendees:
Johan, Heather, Ivan, Matthew
Regrets:
Scott, Roland
Agenda:
0 - Agenda bash
1 - Architecture
We talked about starting to look forward to the SSI stuff. This needs prep work to gather the specs, setting out what projects we want to build, what projects we want to get from others, and assemble the base layer so we can do things like evolve Satosa. This will start to get into shape after the NORDUnet conference in mid-September.
2 - Documentation
(From Roland)
Could we at least agree on a layout, a set of chapters and in what order.
Maybe we could begin with deciding on who we are writing for.
Surely we will have readers that comes to us with different backgrounds/needs :
- An architect, who wants a birds eyes view over what the system can or can’t do
- A service provider, who wants a set of steps that will get a server running.
- An apps implementer who wants to know how to interface with the client API and use Google/Github/.. as the OP.
- A developer that wants to add another endpoint/service to the system.
Matthew suggests we start with implementers, people who are trying to deploy apps that need federated login help. The examples in the repo aren't sufficient to get there. We need to make it easier for them to come in and user our tools. There's no obvious way to make an IdP object. It's not clear how to use the classes and methods correctly. Example: authentication context in an IdP response. Not obvious that he needed to provide it in the first place, nor how to structure it (until he looked the source code itself; it wasn't in the in-line help). What would be most immediately helpful would be adding more detail to method doc-string. If we could do this, more people could increment the immediate needs of documentation.
Ivan suggests our tools are low level enough that the current audience is people who understand them, and that would be implementers. But if we build packages that integrate with known framework, then that should shortcut a lot of documentation. They would use the tools through the layer of the framework. The adapter is trimming the API space to a more constrained set of default behavior. The rest of the things that can be done are not there; to use those, you have to have a more in-depth understanding of the libraries, and then we need the more low-level, developer-oriented documentation.
Johan agrees with Matthew that the smaller efforts will get immediate wins. Using the new OIDC code is documented like the old code, which means it's wrong for the new libraries. If we can focus on the functions, it'll be the easiest way to get something done now.
Ivan - agree, we can start here then move into restructuring thing. One limit of focusing on method doc-strings: We can tell what the function is doing, but it misses out on the context of why the function is doing it. We need both types of documentation.
Let's focus on requesting doc strings updates and creating documentation for the following questions:
• how to create a custom SAML response,
• how to validate a signature,
• how to create a SAML request,
• how to get custom metadata.
Matthew will work on documentation for the functions he's been working on most recently (stuff he touched for the test IdP), create a small PR, and we can see how that works.
Ivan: we could work on adding typing information. That will be complicated in pySAML2; would need to be added in the code not the doc string. This is something Ivan is working on, but it's slow going.
3 - GitHub review
a. OIDC - https://github.com/IdentityPython (JWTConnect-Python-OidcRP, JWTConnect-Python-CryptoJWT, etc)
No update
b. Satosa - https://github.com/IdentityPython/SATOSA
• SLO - Hannah is working on this; has it working from the perspective from an SP. If an SP requests a logout, that request will go to Satosa and back down to all the other SPs someone is logged into, then back up to the originating IdP. There were some problems getting the logout requests to all the downline SPs that support it, but those seem to be addressed. Now working on the IdP-originated logout and cleaning up the code to submit a PR. Will hopefully have a presentation about this ready next month (after NORDUnet conference).
Architecture ideas for Satosa: we'd talked about switching to a more well-known framework. That's probably going to be FastAPI. Along with that big change, we could also take a few other decisions. Ivan is leaning towards using lxml exclusively (not the built-in python xml parsers). We have long-standing questions about name prefixes. The python built-in parsers rewrite whatever is in the xml document and put in their own prefix. With lxml we can keep our own namespaces and treat them correctly. This helps properly validating signatures. The schema validator we use is lxml-schema, and we also use an underlying library called element-path. They are well made and useful and support lxml, but they can fall back to the python built-in parsers. For other things we use, we were looking to other python projects and how they choose their dependencies. There are two communities providing useful tools:
• https://github.com/encode
• https://github.com/MagicStack
They are using encode/httpx library, which will let us use the async code (we don't need it now, but will with FastAPI).
Ivan is working on a release for Satosa which will include a makefile. There are some other MRs there, but the focus will be on https://github.com/IdentityPython/SATOSA/issues/404 and https://github.com/IdentityPython/SATOSA/pull/405. The way Satosa is built now does not support a path within the base domain you set; the MRs associated with these issues should resolve that.
Another MR involves the ORCID IdP. That MR is making some things optional; Ivan is investigating.
c. pySAML2 - https://github.com/IdentityPython/pysaml2
The changes we talked about will impact pySAML2 somewhat, though users won't see much of a difference.
Ivan released a new version (7.2.0) - https://github.com/IdentityPython/pysaml2/releases/tag/v7.2.0 - no major changes. Mostly fixes, additional schemas (e.g., for eIDAS). There is a new option for a timeout for requests (see request module). Using cryptography to log certificates, which will allow us to support chains of certs.
Working on switching to poetry. This mostly works, but still investigating how to include schema files that are not python files. Also not sure how to update changelog notes automatically. Maybe make this a requirement of PRs?
d. Any other project (pyFF, djangosaml2, etc)
4 - AOB
Have added documentation on how to submit new files to different repos on how to submit code, security issues, etc. Also writing rules in the makefiles.
Thanks! Heather
(Not sure how I missed sending these out; sorry about that!)
Attendees:
Ivan, Matthew, Heather
Notes:
0 - Agenda bash
1 - Administrivia
Note that GÉANT has a tender open for idpy developers. No public link, but if anyone knows anyone that might be interested, a job description has been posted to the #random channel on idpy slack.
2 - Documentation
FAQ vs in-line documentation in the code. The FAQ idea of using responses in issues was very disorganized; Heather spent some time on that and found them not very useful. The in-line doc as they stand today aren't sufficient. The functions aren't documented sufficiently to actually tell someone how to use them. But if we do add the how-to to the doc string, they'll be too long. Ivan sees the separation being "what it is" is in the doc string, but "how to use it" should be in a different place.
Even with the idea that the doc strings being just descriptive, they are falling behind and are not complete.
Matthew points out that he's having to read a lot of source code to figure out how to implement the code. This works well enough for him, but it's hard to teach reverse engineering to new developers. There is good example code out there, but there are places where he doesn't even know he needs to go look at the sample code. What would like to see online is everything: how to use it and the reference material (deep dive on function calls).
We have a short configuration how-to (how to build a small SP or IdP) but it's very basic on readthedocs.io. We could create a list of what we want to display there (e.g., how to create a custom SAML response, how to validate a signature, how to create a SAML request, how to get custom metadata). Part of the answers to those are in the Issues and PRs, and some are in the code itself. We'll need to improve the code (doc strings) at the same time. Let's start with a small list (see examples above).
Ivan is also working on a few small documents: release.md, security.md, developers.md, contributing.md. These answer things like how to create a release, how the code should look, what tools we use, what the doc strings should look like, how to write tests, how to report an incident, where the security policy is located.
See https://docs.github.com/en/repositories/releasing-projects-on-github/automa…
For additional documentation, create a new file here: https://github.com/IdentityPython/pysaml2/tree/master/docs
This will create a new page on readthedocs.io and we can crowdsource answering the questions. Need to use the .rst format. Start by just adding the files to the top level; we can reorganize into folders later.
3 - GitHub review
a. OIDC - https://github.com/IdentityPython (JWTConnect-Python-OidcRP, JWTConnect-Python-CryptoJWT, etc)
b. Satosa - https://github.com/IdentityPython/SATOSA
How should we announce the new docker container? The old repo is still around; should it be removed?
Matthew will draft text for the announcement; aim to post on the website and post in an email. Ivan will point to the official images in the repo.
c. pySAML2 - https://github.com/IdentityPython/pysaml2
d. Any other project (pyFF, djangosaml2, etc)
4 - AOB
Thanks! Heather