refactor!(splitter): abstract io deps#197
Conversation
0652ae8 to
cc77210
Compare
3a3cda4 to
92414d6
Compare
92414d6 to
eb4073f
Compare
ShaMan123
left a comment
There was a problem hiding this comment.
This PR is ready for review @agviegas .
A few things:
extractandsplitdiffer in logging.splitis more verbose. I wish to unify them. Which do you prefer? Search for occurrences ofIfcSplitter#mark.- I think we should extract
forEachLineto a super classIfcStream, perhaps we can offer an abstraction forwriteLineas well. It is a useful util. - I wish to remove bitwise operations from the code. It is an unnecessary complication with no real gain in this case.
- I wish to add unit tests, how do I do that. I see
jestas a dep but it doesn't seem used. I would suggest dropping it in favor ofvitest. A simple snapshot test of thedataevent can be very valuable for us now that we plan to change behavior. This can be done in a follow up since I have run a snapshot test locally comparing both the data event (groupsData) and the split file contents. - Should we extract all helpers to a standalone util file?
- Warning events currently fire only from
extract. I could type it so it is clear. - I scoped
emitSplitLineandemitExtractLineunderIfcSplitterbut after completing work I see it is not a must.
What do you think?
| let tail = ""; | ||
| const readableStream = await this.io.readableStream(filePath); | ||
|
|
||
| for await (const chunk of streamAsyncIterator(readableStream)) { |
There was a problem hiding this comment.
| for await (const chunk of streamAsyncIterator(readableStream)) { | |
| for await (const chunk of readableStream) { |
This should be sufficient though TS throws:
error TS2504: Type 'ReadableStream<any>' must have a '[Symbol.asyncIterator]()' method that returns an async iterator.I am not sure why TS throws, maybe because of the baseline (but that seems widely adopted). Updating @types/node made no difference.
So this is good enough IMO.
| const groupElementIds = groups[g]; | ||
| if (groupElementIds.size === 0) { | ||
| groupsData.push(null); | ||
| console.log(` Group ${g + 1}: SKIPPED (empty)`); |
There was a problem hiding this comment.
should this become a warning?
| if (groupElementIds.size > requestedIds.size) { | ||
| console.log( | ||
| ` Expanded to ${groupElementIds.size} elements (void/fill + aggregation coupling)`, | ||
| ); | ||
| return { header, footer, index }; | ||
| } |
There was a problem hiding this comment.
Should this become a warning?
I decided to return the extracted ids instead
| return this.specialRaws.get(id); | ||
| } | ||
|
|
||
| getAll(types: Set<string>) { |
There was a problem hiding this comment.
Future proofing by passing types
| includeSet: Set<number>, | ||
| rewrittenLines: Map<number, string>, | ||
| ): Promise<void> { | ||
| if (raw.charCodeAt(0) !== 35) return; // '#' |
There was a problem hiding this comment.
For readability should I:
| if (raw.charCodeAt(0) !== 35) return; // '#' | |
| if (raw.charCodeAt(0) !== '#'.charCodeAt(0)) return; // '#' |
Or even better
| if (raw.charCodeAt(0) !== 35) return; // '#' | |
| if (!raw.startsWith('#')) return; |
There are a few occurrences. Machine code 👎
| if (groupSizes[g] < groupSizes[minIdx]) minIdx = g; | ||
|
|
||
| this.emitProgressEvent("resolve"); | ||
| this.dispatch("data", groupsData); |
There was a problem hiding this comment.
Naming is not great since this event is only dispatched from split.
| this.dispatch("data", groupsData); | |
| this.dispatch("splits", groupsData); |
|
Expanding on const stream = (await openAsBlob(path, { type: "text/plain" }))
.stream()
.pipeThrough(new IfcDecoderStream());
for await (const line of stream) {
// actual raw line
}Claude said: class IfcDecoderStream extends TransformStream<Uint8Array, string> {
constructor(encoding = "utf-8") {
let tail = "";
const decoder = new TextDecoder(encoding);
super({
transform(chunk, controller) {
const text = decoder.decode(chunk, { stream: true });
if (!text) return;
let start = 0;
let idx = text.indexOf("\n");
if (idx !== -1) {
let end = idx;
if (end > 0 && text.charCodeAt(end - 1) === 13) end--;
controller.enqueue(
tail ? tail + text.substring(start, end) : text.substring(start, end),
);
tail = "";
start = idx + 1;
} else {
tail += text;
return;
}
while ((idx = text.indexOf("\n", start)) !== -1) {
let end = idx;
if (end > start && text.charCodeAt(end - 1) === 13) end--;
controller.enqueue(text.substring(start, end));
start = idx + 1;
}
if (start < text.length) tail = text.substring(start);
},
flush(controller) {
const remaining = decoder.decode();
const full = tail + remaining;
if (full) controller.enqueue(full);
},
});
}
}we could even |
bab6690 to
fc382e2
Compare
|
I really like the stream transformer. |
|
I see I missed this spec
Awaiting your review before acting |
|
Could we use const stream = (await openAsBlob(path, { type: "text/plain" }))
.stream()
.pipeThrough(new IfcDecoderStream())
.pipeThrough(new IfcParserStream());
for await (const ifcEntity of stream) {
// parsed line entity, flat
} |
fc382e2 to
3dde676
Compare
|
I want to merge that on top of this. Should we first merge this and then open a dedicated PR? |
|
Direction is great. Stripping the Node Bugs that need fixing
Worth a second look
Polish
Nits
Bugs 1, 2, 3 are blocking from my side. The rest are suggestions. Let me know when you push the next round and I'll take another pass. |
|
Sorry, missed your questions on the first pass. Going through them: Logging unification. With the event API the verbosity isn't really logging anymore. Apps subscribe to whatever they care about. So I'd unify by having both
Remove bitwise ops. I'd push back. The Tests, drop jest for vitest. Yes. Vitest is the right pick today. A snapshot of the Extract helpers to a standalone util file. Yes. The pure functions ( Warning event typing. A discriminated union or just a typed comment that says "fires from
Stream transformer expansion. Love what you've got. Build changes commit. No on dist artifacts. Yes on tooling configs (vitest, lint, etc.). EventTarget vs web-ifc parser stream. Cool idea, but its own PR. Pinning the splitter to web-ifc's class shapes is worth a deliberate decision. Merge order. This one first, your next as a dedicated PR. Single-purpose merges. What do you think? |
|
I agree with everything.
Lint should catch these kind of errors.
What is better? |
Let's think about this. What use cases will the splitter cater for? Will a consumer want to run another stream transformer after the splitter? |
|
I am rethinking export interface IfcSplitterIO {
/**
* Streams ifc lines
* @param path
* @throws if {@link path} doesn't exist
*/
readableStream(path: string): Promise<ReadableStream<string>>;
/**
* Streams ifc lines
* @param path
*/
writableStream(path: string): Promise<WritableStream<string>>;
}Maybe on web we won't use the path arg. I think I will try to move it up to the caller. export interface IfcSplitterIO {
/**
* Streams ifc lines
* @param path
* @throws if {@link path} doesn't exist
*/
readableStream(): Promise<ReadableStream<string>>;
/**
* Streams ifc lines
* @param path
*/
writableStream(groupId: number): Promise<WritableStream<string>>;
} |
|
I was toying with an idea to stream the ifc to IndexedDB, setting each localId/line as a key/value, before you suggested OPFS. |
|
regarding export class IfcSplitterNode extends IfcSplitter {
constructor() {
super({
writableStream: async (path) => {
+ await mkdir(dirname(path), { recursive: true });
const fileHandle = await open(path, "w");
const nodeWritable = fileHandle.createWriteStream();
return Writable.toWeb(nodeWritable) as WritableStream<string>;
},
});
}
} |
|
ShaMan123
left a comment
There was a problem hiding this comment.
- extracted emit*Line to top level
- refactored events
- fix + standardize progress events:
extractfiresresolvebeforerelations, not sure what to do, please take a look. The code is very fragile so I don;t want to move around stuff without tests.
ready for another review
| @@ -1057,11 +1038,12 @@ export class IfcSplitter { | |||
| } | |||
| } | |||
| } | |||
| this.emitProgressEvent("relations", relationsStart); | |||
Description
This PR is the first step outlined by #180 (comment)
It removes path/fs deps interface in favor of a lightweight specialized io interface based on
ReadableStream/WritableStream.The goal is to create an agnostic API that can run both in node and the browser.
Hence the usage of
ReadableStream/WritableStreamandEventTarget.All console logs have been removed.
console.errornow throws.console.warnemits awarningevent.console.timeemits aprogressevent.There is another
dataevent that is emitted once all groups have been constructed, seeIfcSplitterEvents.The rest have been removed.
Standalone functions have been scoped under
IfcSplitterfor ease of use and overall code quality.BREAKING changes
This change is breaking:
splitis now async and has changed signature to align with environment agnostic requirements.extractis now async.Additional context
What is the purpose of this pull request?
Before submitting the PR, please make sure you do the following:
feat(examples): add hello-world example).fixes #123).