Hi all,
being part of Commons Conservancy brought up yet another subject,
which is whether we should add a header with license information in
every file in the projects under idpy. This is not something done in
an abstract way, there is a specific format modelling this information
(see https://spdx.org/ and https://reuse.software/ - more specifically
https://reuse.software/practices/2.0/) Still, I find it problematic.
We want to open up the question to the wider community and consider
their thoughts on this. The forwarded message below is discussing this
subject. You can see the question we posed, the answer we got and my
comments. Feel free to tell us what you think on this.
---------- Forwarded message ---------
Date: Thu, 16 May 2019 at 09:56
> ---------- Forwarded message ----------
> Date: May 8, 2019, 8:15 AM -0700
>
> > Why does CC think having a single license file per project is
> > insufficient? Our thought is that if we can avoid adding a header to
> > every single file, that would be nice, esp. given we already have this
> > info in the license file and we have the Note Well.
>
>
> this is not just our opinion, but something that is an industry and
> community standard for legal compliance these days. When companies like
> Siemens, Samsung or Honeywell use some code in one of the hundreds or
> thousands of devices and systems in their product line, they need to be
> able to provide the correct license and a download of the exact version.
> This means machine readability too.
>
I've actually observed the opposite of that. Communities abandon the
"license in every file" model, and just use a single LICENSE file in
the root of the project. The LICENSE file contains license
information, that is, it is not a single license but it has exception
sections and so on.
> To quote from https://reuse.software/practices/2.0/ :
>
> Scroll to the section "2. Include a copyright notice and license in each
> file"...
>
> "Source code files are often reused across multiple projects, taken from
> their origin and repurposed, or otherwise end up in repositories where
> they are separate from its origin. You should therefore ensure that all
> files in your project have a comment header that convey that file’s
> copyright and license information: Who are the copyright holders and
> under which license(s) do they release the file?
>
Continuing from above, the standardization of package-management
formats and tools has helped exactly with that: to avoid distribution
of single files, and instead provide packages and modules. It is bad
practice and considered a hack to copy files. Nobody liked that model
and everyone is moving away; it is unstructured, it becomes
unmanageable and it will cause problems.
> It is highly recommended that you keep the format of these headers
> consistent across your files. It is important, however, that you do not
> remove any information from headers in files of which you are not the
> sole author.
>
> You must convey the license information of your source code file in a
> standardised way, so that computers can interpret it. You can do this
> with an SPDX-License-Identifier tag followed by an SPDX expression
> defined by the SPDX specifications."
>
> (the text goes on for a while after this, to clarify the point but this
> is the basic gist of it)
>
> There is a nice Python tool to check:
>
> https://github.com/fsfe/reuse-tool
>
> I hope this makes sense
>
Well, it does not make complete sense. We're talking about licensing a
project. A project is not just code; there are data files (html, xml,
yaml, json files), binary files (archives/zip, images, audio, video,
etc), text files (configs, ini-files, etc) all "not-code". How do you
mark those files? Does the LICENSE file need a license-header? The
json format does not define comments, how do you add a header there?
If a binary file does not get a license header, why should a file with
code get one?
I would expect there to be a way to have the needed information
unified. If the files themselves cannot provide this information it
has to be external; thus the LICENSE file. If someone is worried about
somebody else re-using single files that do not have license
information (a python file, a png image, etc) there is really nothing
you can do (the DRM industry has been trying to solve for a long time;
and still your best bet is "social DRM").
Since, we're developing on open source with a permissive license, even
if someone does that, should we be happy that someone is actually
using what we built or sad that the files they copied did not have a
license header? And if they include the license information of that
copied file in their project's LICENSE file, is this solved?
Having pointed these contradictions, I am thinking that the "license
in every file" model seems to be a step backwards. It is introducing
overhead and does not really solve the problem, while at the same time
it enables a culture of bad practice (copying files around).
Cheers,
--
Ivan c00kiemon5ter Kanakarakis >:3
Hello everyone,
there has been a report on incident-response at idpy.org about a security
issue in PySaml2.
Alexey Sintsov and Yuri Goltsev from HERE Technologies reached out and
reported a XML Signature Wrapping (XSW) vulnerability. The issue
affects responses with signed assertions. PySaml2 can be tricked to
think that an assertion had been signed and use the assertion
information, when in reality the Signature points to another part of
the xml document that is controlled by another party.
The issue was assigned CVE-2020-5390 and is now fixed in the latest
pysaml2 release.
The relevant code commit that fixes is the issue:
https://github.com/IdentityPython/pysaml2/commit/5e9d5acbcd8ae45c4e736ac521…
Release v5.0.0 contains more changes, including:
- Add freshness period feature for MetaDataMDX
- Fix ipv6 validation to accommodate for addresses with brackets
- Fix xmlsec temporary files deletions
- Add method to get supported algorithms from metadata
- Add mdstore method to extract assurance certifications
- Add mdstore method to extract contact_person data
- Start dropping python2 support
Pointers to the release with changelog and more information, below:
- the relevant release commit:
https://github.com/IdentityPython/pysaml2/commit/f27c7e7a7010f83380566a219f…
- the github release:
https://github.com/IdentityPython/pysaml2/releases/tag/v5.0.0
- the pypi package:
https://pypi.org/project/pysaml2/5.0.0/
+ + + + + + + +
In more detail, regarding the XSW vulnerability:
libxml2 follows the xmldsig-core specification. The xmldsig
specification is way too
general. saml-core reuses the xmldsig specification, but constrains it to use of
specific facilities. The implementation of the SAML specification is
responsible to
enforce those constraints. libxml2/xmlsec1 are not aware of those
constraints and thus
process the document based on the full/general xmldsig rules.
What is happening is the following:
- xmldsig-core allows the signature-information and the data that was
signed to be in
different places. This works by setting the URI attribute of the
Reference element.
The URI attribute contains an optional identifier of the object
being signed. (see
"4.4.3 The Reference Element" --
https://www.w3.org/TR/xmldsig-core1/#sec-Reference)
This identifier is actually a pointer that can be defined in many
different ways; from
XPath expressions that need to be executed(!), to a full URL that
should be fetched(!)
in order to recalculate the signature.
- saml-core section "5.4 XML Signature Profile" defines constrains on
the xmldsig-core
facilities. It explicitly dictates that enveloped signatures are the
only signatures
allowed. This mean that:
* Assertion/RequestType/ResponseType elements must have an ID attribute
* signatures must have a single Reference element
* the Reference element must have a URI attribute
* the URI attribute contains an anchor
* the anchor points to the enclosing element's ID attribute
xmlsec1 does the right thing - it follows the reference URI pointer
and validates the
assertion. But, the pointer points to an assertion in another part of
the document; not
the assertion in which the signature is embedded/enveloped. SAML
processing thinks that
the signature is fine (that's what xmlsec1 said), and gets the
assertion data from the
assertion that contains the signature - but that assertion was never
validated. The
issue is that pysaml2 does not enforce the constrains on the signature
validation
facilities of xmldsig-core, that the saml-core spec defines.
The solution is simple; all we need is to make sure that assertions
with signatures (1)
contain one reference element that (2) has a URI attribute (3) that is
an anchor that
(4) points to the assertion in which the signature is embedded. If
those conditions are
met then we're good, otherwise we should fail the verification.
--
Ivan c00kiemon5ter Kanakarakis >:3
Hi Leif,
I added 2 pipes to buildin.py:
- publish_html creates static HTML views of IDPs and SPs, using XSLT based on Peter Schober’s alternative to MET;
- publish_split: similar to store, but added validUntil and creates signed XML-file per EntityDescriptor. This can be consumed dynamically by ADFS in an IDP role.
I put it directly into buildin.py because it shares some code with the sign pipe. Is this viable from your PoV - if yes, I would make an PR.
Cheers, Rainer
Attendees
Roland, Ivan, Heather, Johan, John, Scott, Christos
Notes:
1 - GitHub review
a. OIDC - https://github.com/IdentityPython (JWTConnect-Python-OidcRP, JWTConnect-Python-CryptoJWT, etc)
Libraries are progressing. Last time, we talked about the abstract storage interface. Working on getting the whole stack to use that. The goal being to start two OPs at the same time, sharing the data (and the workload). Roland has heard no comments on the design, and so is moving ahead. Ivan’s thoughts are that he might have chosen a simpler API himself, but overall this works.
After this, we can get into scopes per RP.
b. Satosa - https://github.com/IdentityPython/SATOSA
Still needs to work on writing down his thoughts on the overall architecture for idpy, as well as for the specific projects like Satosa.
Currently looking at working with the WSGI interface. When a network request comes in, before it gets to the apps, it must go through different middleware layers. The goal is to make this a first-class concept in Satosa, with a new config option about what tools will run those layers.
Logging: Looking to use the bind interface, so when we switch to stratlog, we will be compatible with that interface. The middleware will create or get the sessionID and that will be used throughout Satosa, and will bind it to the logger. We want to have a single logger or dictionary that will be added to the logger output every time the logger is invoked. That data needs to be available through the flow, and then needs to be cleaned when it leaves the proxy. That will be done through a middleware as well.
Scott offers a rough use case where there might be a need for a change in the format of the sessionID. It might be helpful if this can be reconfigured on the fly. If the middleware objects are exposed such that they can be accessed by other objects or components, that might be helpful. Ivan suggests always run the default one, and have code in the middleware (not the microservice) that says if certain conditions are met, to overwrite the sessionID. Ivan will consider this further.
Ivan is also looking at doing some more error handling, and redirecting to different pages. If we have the concept of middlwares, we can have the global handler that handles all the exceptions, fills up a WYSGI parameter, and then uses appropriate middleware to handle. This way, the user can decide how the error needs to be handled. This will improve the levels of flexibility for how the whole system behaves.
There are also some merge requests that Ivan still needs to look at.
c. pySAML2 - https://github.com/IdentityPython/pysaml2
Merged the fix for the ID attribute names (decrypt, encrypt, sign, or verify). This has been a merge request for a while. We can probably iterate on this idea some more, like how to find the node that holds all the signing or encryption information.
Also planning to accept the merge request on giving the namespace prefixes proper names. (https://github.com/IdentityPython/pysaml2/pull/625)
Ivan will be looking into refactoring how we select algorithms, and how we can select default ones. Right now, SHA-1 is hardcoded, after the next release.
Issue about the scopes on the attributes: this was filed on the Satosa repo, but it is probably a pySAML2 that will be doing this for Satosa. (https://github.com/IdentityPython/SATOSA/issues/297)
d. pyFF - https://github.com/IdentityPython/pyFF
n/a
2 - AOB
Note that work will be starting on the eIDAS front end code.
Thanks! Heather
Attendees
Roland, Scott, Giuseppe, Heather, Johan, Matthew, Hannah, Ivan, John P, Leif
Notes:
1 - GitHub review
a. OIDC - https://github.com/IdentityPython (JWTConnect-Python-OidcRP, JWTConnect-Python-CryptoJWT, etc)
Roland and Giuseppe started looking at abstract storage - a way to persist information that the OP/RP gather during their lifetime. Started a draft to describe this, and an implementation using SQL Alchemy. This changes the CryptoJWT API slightly. There is an abstract storage library now in GitHub, and Roland has a branch on CryptoJWT (abstorage) that uses what Giuseppe has.
See PoC: https://github.com/peppelinux/pyAbstractStorage
Ivan notes we probably want to avoid being tied into SQL Alchemy, but the design should allow for other possibilities (e.g., MongoDB)
Please review and submit issues or PRs.
b. Satosa - https://github.com/IdentityPython/SATOSA
Still working on the exception handling, designing a layered approach to how we handle errors. Instead of a single function, will try to propagate exceptions to other layers so they will handle them as they need.
Also doing work on the loggers and have a few things ready. Before he pushes, want to make sure we have a compatible path for the new library and dependencies. This is based on structlog.
Whenever the state of the proxy is saved in the cookie and when the user redirected to another service, we need to clean up the state of the cookie. We’re not near any limits wrt the cookie, but this will be a way to keep things clean.
One use case: stopping a SAML flow with Satosa, and then allow the user to go through an entirely different SAML flow that still uses the same Satosa instance, then put the user back into the original SAML flow. An example of this is a step-up flow. Ivan has some code that does this. The consent microservice also does this. The routing mechanism, though, needs to be rewritten. This will one of the bigger refactoring we need to look into.
https://github.com/IdentityPython/SATOSA/pull/322 - the flow that can start from the discovery service; this has been deferred for a while - is it ready to roll? Not yet; will review on next call
https://urlproxy.sunet.se/canit/urlproxy.php?_q=aHR0cHM6Ly9mdW5jdGlvbnRyYWN… - allows you to get info from a running service. Ivan is trying to make this work with Satosa; it was originally designed to run around an executable, not in a flow like ours.
c. pySAML2 - https://github.com/IdentityPython/pysaml2
Someone reported an issue where building rpms was not working. Given the age of these systems, we’re not going to try and fix this. Even if we wanted to support this, it is more properly an issue for the Python maintainers.
Another issue ( https://github.com/IdentityPython/pysaml2/issues/675) was about pylint errors - most of the issues were a result of not all libraries being installed.
On the list:
• https://github.com/IdentityPython/pysaml2/pull/662 - want to merge 662 (cleanup on the name of the id attribute used by xmlsec in order to encrypt/decrypt/sign XML documents, specifically SAML).
• https://github.com/IdentityPython/pysaml2/pull/665 - Also want to merge a fix for the tmp file creation. Python gives you a way to create tmp files, but this only works for UNIX systems. This needs to be refactored in order to support Windows.
d. pyFF - https://github.com/IdentityPython/pyFF
2 - AOB
FastFed - could the structure be useful to us? Maybe - the work started with a specific task to connect an OP with an SP. But this does not meet many of the needs of Higher Ed, and many in the higher ed sector have voted against it. The vote did pass, but they have asked for a representative of the group that voted against it to join and work with them on the spec. They had asked early on if they could use the OIDC federation work, and that would have covered many points but not the attribute mapping issues they also had.
The proxy should do attribute mapping better as well; that’s on the list.