Avoid judging VS Code chat history from observed data alone#

Last time, I created a mechanism to export Copilot Chat sessions using export_sessions.py in order to write an article about my interactions with AI later.

But once I actually tried using it, several questions came up right away.

  • The --list list does not match the chat history list in VS Code

  • GitHub Copilot chat and CODEX chat seem to be recorded differently

  • Can interactions on the CODEX side be extracted through the same latest-log workflow?

In VS Code, some entries in the chat history list have icons and others do not. The displayed content also changes when switching between “Chat” and “CODEX” at the top.

At first, judging from the appearance, I wondered whether entries with icons might be CODEX sessions.

At the root of this question was a simple feeling that something did not add up.

The CODEX session history is certainly visible on the GUI. You can check the history on the CODEX side by switching tabs. However, when I look at the script I created, the history does not match well.

If it exists on the screen but cannot be found from the script, it is more than just “no data.” Something about the storage location you’re looking at, the format you’re reading, or the conditions you’re evaluating must be out of sync with how VS Code actually handles it.

I had to pause here for a moment.

If you create judgment logic based only on the saved data at hand, you will end up with a script that is optimized for the range that you happen to use. Then you might miss sessions in other modes or states.

This time, due to this discomfort and anxiety, I changed the policy of export_sessions.py from “observing actual data” to “prioritizing the specification and supplementing it with actual data.”

The First Problem I Saw#

The existing export_sessions.py mainly looked at the following locations:

Code/User/workspaceStorage/<workspace-id>/GitHub.copilot-chat/transcripts/

By reading .jsonl under this, you can extract the main text of Copilot Chat.

However, when trying to display information similar to the chat history list in VS Code, transcripts was not enough.

The list display includes the session title, creation date and time, status, type, etc. They are not necessarily complete with transcripts; there are also clues in chatSessions under workspaceStorage.

In other words, the data sources that should be looked at differ depending on the purpose.

  • transcripts is important to extract the main text

  • chatSessions is also important to get closer to the VS Code list

  • Both need to be unified by session ID

At this point, it’s pretty clear why --list is out of sync with VS Code’s list.

Why relying only on observed data felt risky#

In the first implementation proposal, I looked at the chatSessions at hand and tried to determine the CODEX-likeness from the mode.kind and mode.id values.

For example, in the observed data, mode.kind was set to agent. Also, mode.id contained values ​​such as agent and Plan.agent.md.

Looking only at this part, it is tempting to write a rule like, “If mode.kind=agent, it might be CODEX.”

But this is dangerous.

The data I have at hand is just the result of the operations I tried that day. It does not cover all combinations of Ask, Plan, and Agent, and it is not possible to know from the actual data alone how the internal implementation of VS Code or the extension stores the values.

At this stage, it became necessary to prioritize the question “Is it okay to make a judgment based on this evidence?” over “I think it can be implemented.”

Rethinking the approach from the specification#

Therefore, I decided to check the public information on VS Code itself and related extensions, and consider how much of the information can be relied upon as specifications, and which information can be supplemented with data at hand.

What I particularly wanted to check was what the “icon on the left” displayed in the VS Code session history list meant.

As a result of my research, I think it would be a good idea to at least think like this.

  • The icon on the left does not simply indicate whether it is a CODEX or not.

  • Status icons such as in progress, input required, failed, unread, etc. may take precedence.

  • If it is not a status icon, an icon derived from the session type or provider appears.

  • The main basis for CODEX judgment should be sessionType and providerType rather than the model name.

In other words, the judgment that “it is CODEX because it has an icon” is too rough.

On the other hand, the direction of making CODEX detection rely on evidence close to VS Code’s icon display is correct.

This difference was important.

Decide on priority for judgment#

In the end, we decided to consider the session type determination in export_sessions.py based on the following priorities.

  1. Whether sessionType or providerType is CODEX-based

  2. Whether the scheme of the session resource is CODEX-based or not

  3. A supplementary look at CODEX-likeness from mode information

  4. Look at the model name as a final aid

The reason for this order is simple.

I wanted to give priority to information similar to what the VS Code UI handles as the session type, and use information like mode and model name for assistance.

Judging only by whether the model name includes codex is likely to break when the model name changes in the future. Also, the model name does not necessarily have the same meaning as the UI classification of which provider the session belongs to.

I should not ignore observed data, but I should not make it the main source of truth either. The specification and the UI model come first; observed data is there to verify that model and adjust for the storage format.

What I changed in the implementation#

In this work, we updated export_sessions.py in the following direction.

  • Add JSONL parsing for chatSessions

  • Integrate transcripts and chatSessions by session ID

  • Allow --type to select all / chat / codex

  • --recent N allows handling of the most recent N items

  • Give priority to provider / sessionType for CODEX judgment

  • Correctly read kind=0 initial snapshot of chatSessions

  • Normalize to datetime if creationDate is epoch milliseconds

For example, the commands look like this.

rye run python export_sessions.py --list --recent 5
rye run python export_sessions.py --list --type codex --recent 5
rye run python export_sessions.py --latest

Now, you can consider the base for outputting information similar to a VS Code list and the base for exporting the main text log separately.

However, at this time, there were not enough providerType and sessionType signals remaining in the saved data that could be used for CODEX determination, and there were 0 --type codex signals.

Here, AI was organized as a state in which the decision logic and the limits of saved data were separated.

However, I am still not convinced.

This is because you can check the CODEX history on the VS Code GUI. If it can be displayed on the screen, there must be information that makes up the list somewhere. If the –type codex option returns 0 entries in the script, it is more natural to view it as “my script has not yet reached the same basis as the GUI” rather than “there is no CODEX history.”

Therefore, this result is not a completion but a progress in progress. At least for me, it feels less like “I found the limit” and more like “I still have not fully mapped the relationship between the GUI and the saved data.”

In this article, I would like to make it important to preserve this sense of discomfort. In the same way, people who look into the storage location of VS Code’s chat history or CODEX sessions will probably end up stuck in the same place.

What I learned#

The biggest thing I learned this time was that there is a difference between a judgment that merely works and a judgment that has a solid basis.

If you look at the JSONL at hand, you can create logic that operates on data saved on that day. However, that alone does not tell us what the VS Code UI uses to compose the list, or what information the extension handles as the session type.

In particular, when the purpose is to create a list similar to what is displayed on the screen, as in this case, it is necessary to look not only at the shape of the saved file, but also at what concepts the UI is displaying.

Also important was the mix of status icons and type icons.

Just because there’s an icon on the left doesn’t always mean CODEX. It may be an icon to indicate the status, or it may be an icon to indicate the type of provider.

If you handle this kind of thing roughly, the meaning of the list and filter will become different later on.

What to check next#

Next time, I’d like to rerun export_sessions.py immediately after opening a CODEX session to see which identifiers are stored in that session.

In addition, I would like to more directly find out which data source and provider information the GUI’s “Chat/CODEX” switching is internally connected to. Since it appears in the list on the screen, there must be an entry to configure that list somewhere in VS Code itself or an extension.

In particular, I would like to see the following information:

  • Are sessionType and providerType saved?

  • Does CODEX information appear in the resource scheme?

  • What are the differences between Ask, Plan, and Agent in the save format?

  • To what extent do the order, title, date and time of VS Code list display and --list match?

  • How can I find the correspondence between the history displayed on the CODEX tab of the GUI and the locally saved file?

If we can confirm this, we should be able to improve the accuracy of --type codex one more step.

Closing#

The mechanism for recording interactions with AI as a log was not just a matter of reading JSONL.

Do you want to extract the main text, do you want to get closer to the VS Code list, or do you want to separate GitHub Copilot and CODEX? The data you should look at and the basis you should rely on will change depending on your purpose.

This time, I am glad I stopped after looking at the observed data and trying to implement everything at once, and asked myself, “Is this really enough to understand the specification?”

At the same time, I also felt that even if the AI has organized things in a reasonable way, if the human side is still not convinced, there are still questions worth writing about.

It is visible in the GUI. However, I still can’t get it with the script.

The next research topic will be how to fill this gap.

While valuing the observations at hand, don’t be drawn in by them alone.

Even a small script for personal use will become a tool that will be easier to develop later if you think about it in this order.

Article information

author:

mtakagishi

Published:

2026-05-30