Comment by chaos_emergent

Comment by chaos_emergent 7 months ago

2 replies

as I sit in front of my computer waiting for a transformation-for-import job to complete, I can describe my basic workflow:

1. define a clean interface target - for me, that's an interface that I made for my startup to import call data.

2. explore the data a little to get a sense of transformation mappings.

3. create a PII-redacted version of the file, upload it to ChatGPT along with the shape of my interface, ask it to write a transformation script in Python

4. run it on a subset of the data locally to verify that it works.

5. run it in production against my customer's account.

I'm curious - that seems like a reasonably standard flow, and it involves a bit of manual work, but it seems like the right tradeoff between toil and automation. Do you struggle with that workflow or think it could be better somehow?

dataflowmapper 7 months ago

Thanks for sharing that workflow, for more straight forward flows, that sounds like a decent approach. My main thoughts on where it could be improved, or where I see potential struggles, are when:

  - People aren't comfortable or familiar with coding/Python.
  - You get into more complex imports like historical data, transactional data, etc. There you might have like 15 transaction types that have to be mapped, all with different fields, math, and conditional logic where the requirements become too much for just prompting ChatGPT effectively, and iterating on the Python can get pretty involved.
  - The source data structure and transformation needs aren't consistently the same, leading to a lot of 'throwaway' or heavily modified scripts for each unique case.
  - Tasks like VLOOKUPs or enriching data come into play, which might add manual coding or complexity beyond a simple 1-to-1 source-to-destination script.
These are the areas where I'm exploring if a more structured way could offer benefits in terms of repeatability and accessibility for a wider range of users or complex scenarios. Appreciate the insight into your process and your thoughts on this.
  • sargstuff 7 months ago

    From doing historical data restorations (pre-json), using documented industry/standard structures & protocols, normalized data formats/layouts helps out quite a bit.

    Note: awk/shell scripts, ~80's because relatively portable across platforms (mac,pc,dec,ibm (ignoring ebcdic), etc.)