-
Notifications
You must be signed in to change notification settings - Fork 242
IPIP-337: Delegated Content Routing HTTP API #337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
1d9ec9c
65d178b
4c024dd
0acdb01
f7b4437
13d695c
e3e744a
451b1e9
27d23e8
a9984a9
fce070f
fff68c3
11f4ca5
39c467e
87ff0ac
96d55d0
4264a2d
0f49dcf
7238e63
e823d9e
19fff93
1aac44c
acc397b
325ca1e
9c47a31
512bc05
655b1f2
d343189
573417e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,104 @@ | ||
| # IPIP 0000: Delegated Routing HTTP API | ||
|
|
||
| - Start Date: 2022-10-18 | ||
| - Related Issues: | ||
| - (add links here) | ||
|
|
||
| ## Summary | ||
|
|
||
| This IPIP specifies an HTTP API for delegated routing. | ||
|
|
||
| ## Motivation | ||
|
|
||
| Idiomatic and first-class HTTP support for delegated routing is an important requirement for large content routing providers, | ||
| and supporting large content providers is a key strategy for driving down IPFS latency. | ||
| These providers must handle high volumes of traffic and support many users, so leveraging industry-standard tools and services | ||
| such as HTTP load balancers, CDNs, reverse proxies, etc. is a requirement. | ||
| To maximize compatibility with standard tools, IPFS needs an HTTP API specification that uses standard HTTP idioms and payload encoding. | ||
| The [Reframe spec](https://github.com/ipfs/specs/blob/main/reframe/REFRAME_PROTOCOL.md) for delegated content routing was an experimental attempt at this, | ||
| but it has resulted in a very unidiomatic HTTP API which is difficult to implement and is incompatible with many existing tools. | ||
| The cost of a proper redesign, implementation, and maintenance of Reframe and its implementation is too high relative to the urgency of having a delegated routing HTTP API. | ||
|
|
||
| Note that this does not supplant nor deprecate Reframe. Ideally in the future, Reframe and its implementation would receive the resources needed to map the IDL to idiomatic HTTP, | ||
| and implementations of this spec could then be rewritten in the IDL, maintaining backwards compatibility. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not just write the spec within the IDL of the data and define the transport to be this? It seems like it'd be easy enough except for the areas where the divergence of this API runs counter to some of the Reframe goals, which seem worth discussing. For example, I put an alternative that seems to capture some of your major changes below. |
||
|
|
||
| ## Detailed design | ||
|
|
||
| See the [Delegated Routing HTTP API design](../routing/DELEGATED_ROUTING_HTTP.md) included with this IPIP. | ||
|
|
||
| ## Design rationale | ||
| To understand the design rationale, it is important to consider the concrete Reframe limitations that we know about: | ||
|
|
||
| - Reframe [method types](../reframe/REFRAME_KNOWN_METHODS.md) are encoded inside messages | ||
| - This prevents URL-based pattern matching on methods, which makes it hard and expensive to do basic HTTP scaling and optimizations: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not a property of Reframe this is a property of the transport defined in this document https://github.com/ipfs/specs/blob/main/reframe/REFRAME_HTTP_TRANSPORT.md, while Reframe is independent of individual transports. e.g. see #327 which doesn't touch any method specifications just adds an alternative (v2) HTTP transport. |
||
| - Configuring different caching strategies for different methods | ||
| - Configuring reverse proxies on a per-method basis | ||
| - Routing methods to specific backends | ||
| - Method-specific reverse proxy config such as timeouts | ||
| - Developer UX is poor as a result, e.g. for CDN caching you must encode the entire request message and pass it as a query parameter | ||
| - This was initially done by URL-escaping the raw bytes | ||
| - Not possible to consume correctly using standard JavaScript (see [edelweiss#61](https://github.com/ipld/edelweiss/issues/61)) | ||
| - Shipped in Kubo 0.16 | ||
| - Packing a CID into a struct, encoding it with DAG-CBOR, multibase-encoding that, percent-encoding that, and then passing it in a URL, rather than merely passing the CID in the URL, is needlessly complex from a user's perspective | ||
| - Added complexity of "Cacheable" methods supporting both POSTs and GETs | ||
| - The required streaming support and message groups add a lot of implementation complexity, but streaming does not work for cachable methods sent over HTTP | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What types of routers are "inbounds" here if streaming support is off the table? Here are a few routing systems in place today:
By making the API non-streaming you effectively only end up supporting cid.contact, in which case it doesn't seem like it's so different then using the Indexer-specific API with the
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| - Ex for FindProviders, the response is buffered anyway for ETag calculation | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should not be true, when the discussion for ETags was brought up it was flagged that FindProviders cannot be forced to buffer data although some implementations may choose to (e.g. storetheindex). If this happened it was a breaking change that was not flagged as such. There was intentional bug fixing here to make sure streaming was supported ipfs/go-delegated-routing#26. |
||
| - There are no limits on response sizes nor ways to impose limits and paginate | ||
| - This is useful for routers that have highly variable resolution time, to send results as soon as possible, but this is not a use case we are focusing on right now and we can add it later | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some use cases I think it'd be quite useful for a delegated routing API to support.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The reason this doesn't include streaming is because there is no immediate need for it, and we can add it later. The narrow scope is intentional so that we can focus on nailing this particular use case. "Later" can be mean immediately after this spec; I just don't think it's helpful to block indexer support on it, since they don't need it. For adding it later, consider a streaming response as just a different response format which can be included in content negotiation, such as ndjson with the same provider record schema. I believe there will remain value in supporting standard
What does this mean, concretely? Provider records have a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From the indexer POV would be great to see paging support so that we can use the protocol with the larger IPFS nodes
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I guess it depends on whose use cases you're thinking about, but IMO there are uses for it today (see some support in #337 (comment)). The library https://github.com/libp2p/js-libp2p-delegated-content-routing which is particularly useful in browsers cannot be replaced in a meaningful way with this API, which as I've mentioned means we're not making any progress here other than adding a new API for an existing system (Indexers) which already has an API.
I think it depends. For example, if the v1 protocol doesn't support streaming and v2 does how long will it be expected for the ecosystem to support v1 for? Note that there is already an HTTP API that does exactly what the Indexers want that IMO is largely unsuitable for content routing as a whole (and I suspect most participants in this PR agree or else we'd be pushing to reuse that spec), unless this is really another attempt at the "Indexer HTTP API" with the wrong title name.
IMO even if we were sticking with this it would still be important for developers to be able to figure out what the data meant which means declaring what they mean here or in a sibling spec.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I feel like I've listed multiple examples in my post earlier in this thread which are not supported, but I can work through them more explicitly.
There are a variety of ways to describe fetching IPLD data over HTTP (some examples here). However, to see the problems with the proposed scheme consider the demo advertisement doing this with the Indexer. The flexibility in this PR matches exactly the flexibility in the Indexer protocol and its However, it has a problem ... there's no good place to put the information. The demo above has a blank HTTP protocol ID, a bogus peerID, and an HTTP multiaddr. This leads to some hacks that being required to address this such as:
Note: BitTorrent is roughly a similar story.
I gave multiple examples there, but to flesh out one. Consider that I'd like to support Just like in the HTTP case there's no PeerID and in this case there's not even a multiaddr to be associated with. So where does this data go? For those wondering there are a number of important applications of this for bringing IPFS compatibility to existing systems. You can take a look at https://github.com/aschmahmann/mdinc (and this PR) for background and a demo of making Docker data available over IPFS.
As an example: I should be able to create an endpoint (e.g. at routing.delegate.ipfs.io) that ingests FindProviders responses from say cid.contact, the DHT, and experimental-new-system.tld and responds with both of them as they arrive. If either system adds new types of responses (e.g. new provider types) that shouldn't require updating the proxy in the middle to understand them in order for the new provider types to propagate their way through.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There is no need for a version bump, it can be introduced in a backwards-compatible way by adding a new content type such as
I agree but I don't think this API should be interpreting the meaning of it, it is pass-through data and asserting semantics here would hinder forwards compatibility. I would hesitate to even call it a "sibling" spec as this API should be completely agnostic to the contents of the payload. IMO it is the responsibility of the producer of the data to document its structure and semantics for the consumer.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah I get the theoretical ideas, but I am having trouble understanding what it means concretely. Are you saying that the peer ID, multiaddr etc. are details of the transfer protocol and should really be part of the opaque payload?? If so, that would make sense and I can change provider records to use tagged unions. What about something like this: Provider record schema: {
"Protocol": "<multicodec_code>",
...
}Bitswap provider record (multicodec_code=2320): {
"Protocol": "2320",
"PeerID": "12D3K...",
"Multiaddrs": ["/ip4...", ...]
} So then the full response becomes something like {
"Providers": [
{
"Protocol": "2320",
"PeerID": "12D3K...",
"Multiaddrs": ["/ip4...", ...]
},
{
"Protocol": "99999",
...
}
]
}Is that like what you have in mind? (Protocols are stringified multicodec codes so that this is compatible with OpenAPI discriminators, which require discriminator properties to be string values, and they are codes instead of multicodec names because apparently names aren't as stable as codes.)
Agreed, that is a design goal of this proposal. |
||
| - The Identify method is not implemented because it is not currently useful | ||
|
guseggert marked this conversation as resolved.
Outdated
guseggert marked this conversation as resolved.
Outdated
|
||
| - This is because Reframe's ambition is to be generic catch-all bag of methods across protocols, while delegated routing use case only requires a subset of its methods. | ||
| - Client and server implementations are difficult to write correctly, because of the non-standard wire formats and conventions | ||
|
guseggert marked this conversation as resolved.
Outdated
|
||
| - Example: [bug reported by implementer](https://github.com/ipld/edelweiss/issues/62), and [another one](https://github.com/ipld/edelweiss/issues/61) | ||
| - The Go implementation is [complex](https://github.com/ipfs/go-delegated-routing/blob/main/gen/proto/proto_edelweiss.go) and [brittle](https://github.com/ipfs/go-delegated-routing/blame/main/client/provide.go#L51-L100), and is currently maintained by IPFS Stewards who are already over-committed with other priorities | ||
| - Only the HTTP transport has been designed and implemented, so it's unclear if the existing design will work for other transports, and what their use cases and requirements are | ||
|
guseggert marked this conversation as resolved.
Outdated
|
||
| - This means Reframe can't be trusted to be transport-agnostic until there is at least second transport implemented (e.g. as a reframe-over-libp2p protocol). | ||
|
|
||
| So this API proposal makes the following changes: | ||
|
guseggert marked this conversation as resolved.
Outdated
|
||
|
|
||
| - The Delegated Routing API is defined using HTTP semantics, and can be implemented without introducing Reframe concepts | ||
| - "Method names" and cache-relevant parameters are pushed into the URL path | ||
| - Streaming support is removed, and default response size limits are added along with an optional `limit` parameter for clients to specify response sizes | ||
| - We might add streaming support w/ chunked-encoded responses in the future, but it's currently not an important feature for the use cases that an HTTP API will be used for | ||
| - Pagination could be added to this in the future, if needed | ||
| - Bodies are encoded using standard JSON or CBOR, instead of using IPLD codecs | ||
|
guseggert marked this conversation as resolved.
Outdated
|
||
| - JSON uses human-friendly string encodings of common data types | ||
| - CIDs are encoded as CIDv1 strings with a multibase prefix (e.g. base32), for consistency with CLIs, browsers, and [gateway URLs](https://docs.ipfs.io/how-to/address-ipfs-on-web/) | ||
| - Multiaddrs use the [human-readable format](https://github.com/multiformats/multiaddr#specification) that is used in existing tools and Kubo CLI commands such as `ipfs id` or `ipfs swarm peers` | ||
| - Byte array values, such as signatures, are multibase-encoded strings (with an `m` prefix indicating Base64) | ||
| - The "Identify" method and "message groups" are removed | ||
|
guseggert marked this conversation as resolved.
Outdated
|
||
|
|
||
| ### User benefit | ||
|
|
||
| The cost of building and operating content routing services will be much lower, as developers will be able to reuse existing industry-standard tooling. | ||
|
guseggert marked this conversation as resolved.
Outdated
|
||
| They no longer need to learn Reframe-specific concepts to consume or expose the API. | ||
| This will result in more content routing providers, each providing a better experience for users, driving down content routing latency across the IPFS netowrk | ||
|
lidel marked this conversation as resolved.
Outdated
|
||
| and increasing data availability. | ||
|
|
||
| ### Compatibility | ||
|
|
||
| #### Backwards Compatibility | ||
| IPFS Stewards will implement this API in [go-delegated-routing](https://github.com/ipfs/go-delegated-routing), using breaking changes in a new minor version. | ||
| Because the existing Reframe spec can't be safely used in JavaScript and we won't be investing time and resources into changing the wire format implemented in edelweiss to fix it, | ||
| the experimental support for Reframe in Kubo will be removed in the next release and delegated routing will subsequently use this HTTP API. | ||
| We may decide to re-add Reframe support in the future once these issues have been resolved. | ||
|
|
||
| #### Forwards Compatibility | ||
| Standard HTTP mechanisms for forward compatibility are used: | ||
| - The API is versioned using a version number in the path | ||
| - The `Accept` and `Content-Type` headers are used for content type negotiation | ||
| - New methods will result in new paths | ||
| - Parameters can be added using either new query parameters or new fields in the request/response body. | ||
|
|
||
| Certain parts of bodies are labeled as "{ ... }", which are opaque JSON values passed through by the implementation, with no schema enforcement. | ||
|
|
||
| ### Security | ||
|
|
||
| None | ||
|
|
||
| ### Alternatives | ||
|
|
||
| This *is* an alternative. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO the alternatives section is not obviated by the existence of existing specs. The existing spec doesn't solve a problem you have which is why you proposed an alternative. It seems unlikely that this is the only way to solve the problem, just the one you currently think is the best. An alternative could be to just define a new HTTP-based transport for Reframe (e.g. #327) that takes some of the good ideas from this proposal. Below is a strawman taking most of the listed issues with Reframe, putting the ideas from here into a new transport and seeing what's left. All method names are
|
||
|
|
||
| ### Copyright | ||
|
|
||
| Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,123 @@ | ||
| #  Delegated Routing HTTP API | ||
|
|
||
| **Author(s)**: | ||
| - Gus Eggert | ||
|
|
||
| **Maintainer(s)**: | ||
|
|
||
| * * * | ||
|
|
||
| **Abstract** | ||
|
|
||
| "Delegated routing" is a mechanism for IPFS implementations to use for offloading content routing to another process/server. This spec describes an HTTP API for delegated routing. | ||
|
|
||
| # Organization of this document | ||
|
|
||
| - [Introduction](#introduction) | ||
| - [Spec](#spec) | ||
| - [Interaction Pattern](#interaction-pattern) | ||
| - [Cachability](#cachability) | ||
| - [Transports](#transports) | ||
| - [Protocol Message Overview](#protocol-message-overview) | ||
| - [Known Methods](#known-methods) | ||
| - [Method Upgrade Paths](#method-upgrade-paths) | ||
| - [Implementations](#implementations) | ||
|
|
||
| # API Specification | ||
| The Delegated Routing HTTP API uses the `application/json` content type by default. Clients and servers *should* support `application/cbor`, which can be negotiated using the standard `Accept` and `Content-Type` headers. | ||
|
|
||
| ## Common Data Types: | ||
|
|
||
| - CIDs are always encoded using a [multibase](https://github.com/multiformats/multibase)-encoded [CIDv1](https://github.com/multiformats/cid#cidv1). | ||
| - Multiaddrs are encoded according to the [human-readable multiaddr specification](https://github.com/multiformats/multiaddr#specification) | ||
| - Peer IDs are encoded according [PeerID string representation specification](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md#string-representation) | ||
| - Multibase bytes are encoded according to [the Multibase spec](https://github.com/multiformats/multibase), and *should* use Base64. | ||
|
|
||
| ## API | ||
| - `GET /v1/providers/{CID}` | ||
|
guseggert marked this conversation as resolved.
Outdated
|
||
| - Reframe equivalent: FindProviders | ||
| - Response | ||
|
|
||
| ```json | ||
| { | ||
| "Providers": [ | ||
| { | ||
| "PeerID": "12D3K...", | ||
| "Multiaddrs": ["/ip4/.../tcp/.../p2p/...", "/ip4/..."], | ||
| "Protocols": [ | ||
| { | ||
| "Codec": 2320, | ||
| "Payload": { ... } | ||
| } | ||
| ] | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
|
|
||
| - Default limit: 100 providers | ||
|
guseggert marked this conversation as resolved.
Outdated
|
||
| - Optional query parameters | ||
| - `transfer` only return providers who support the passed transfer protocols, expressed as a comma-separated list of [multicodec codes](https://github.com/multiformats/multicodec/blob/master/table.csv) in decimal form such as `2304,2320` | ||
| - `transport` only return providers whose published multiaddrs explicitly support the passed transport protocols, such as `460,478` (`/quic` and `/tls/ws`) | ||
| - Servers should treat the multicodec codes used in the `transfer` and `transport` parameters as opaque, and not validate them, for forwards compatibility | ||
| - `GET /v1/providers/hashed/{multihash}` | ||
| - This is the same as `GET /v1/providers/{CID}`, but takes a hashed CID encoded as a [multihash](https://github.com/multiformats/multihash/) | ||
| - `GET /v1/ipns/{ID}` | ||
|
guseggert marked this conversation as resolved.
Outdated
|
||
| - Reframe equivalent: GetIPNS | ||
| - `ID`: multibase-encoded bytes | ||
|
guseggert marked this conversation as resolved.
Outdated
|
||
| - Response | ||
| - record bytes | ||
| - `POST /v1/ipns` | ||
|
guseggert marked this conversation as resolved.
Outdated
|
||
| - Reframe equivalent: PutIPNS | ||
| - Body | ||
| ```json | ||
| { | ||
| "Records": [ | ||
| { | ||
| "ID": "multibase bytes", | ||
| "Record": "multibase bytes" | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
| - Not idempotent (this doesn't really make sense for IPNS) | ||
| - Default limit of 100 records per request | ||
| - `PUT /v1/providers` | ||
| - Reframe equivalent: Provide | ||
| - Body | ||
| ```json | ||
| { | ||
| "Signature": "multibase bytes", | ||
| "Payload": { | ||
| "Keys": ["cid1", "cid2"], | ||
| "Timestamp": 1234, | ||
| "AdvisoryTTL": 1234, | ||
| "Signature": "multibase bytes", | ||
| "Provider": { | ||
| "PeerID": "12D3K...", | ||
| "Multiaddrs": ["/ip4/.../tcp/.../p2p/...", "/ip4/..."], | ||
| "Protocols": [ | ||
| { | ||
| "Codec": 1234, | ||
| "Payload": { ... } | ||
| } | ||
| ] | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
| - `Signature` is a multibase-encoded signature of the encoded bytes of the `Payload` field, signed using the private key of the Peer ID specified in the `Payload`. See the [Peer ID](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md#keys) specification for the encoding of Peer IDs. Servers must verify the payload using the public key from the Peer ID. If the verification fails, the server must return a 403 status code. | ||
| - Idempotent | ||
| - Default limit of 100 keys per request | ||
| - `GET /v1/ping` | ||
| - Returns 200 once the server is ready to accept requests | ||
|
|
||
| ## Limits | ||
|
|
||
| - Responses with collections of results must have a default limit on the number of results that will be returned in a single response | ||
| - Pagination and/or dynamic limit configuration may be added to this spec in the future, once there is a concrete requirement | ||
|
|
||
| ## Error Codes | ||
|
|
||
| - A 404 must be returned if a resource was not found | ||
| - A 501 must be returned if a method is not supported | ||
Uh oh!
There was an error while loading. Please reload this page.