138
COMMENT 9d ago
Nah he just streams Ace Attorney games 24/7
1
COMMENT 16d ago
You cannot use BI tools directly on sorted column-store relational databases, when you have Big Data.
This is just patently false, even for redshift.
2
COMMENT 17d ago
It is. It's not listed as one.
1
COMMENT 17d ago
AWS S3 isn't a cloud technology?
7
COMMENT 18d ago
Number 2 seems like the biggest pain point. Ditch talend/pentaho for either a more modern (prefect, dagster, et al) or a more widely used (eg airflow) orchestration tool.
Depending on how heavy your T in ETL is you might want to move some stuff into Spark (EMR).
Alternatively, many companies are choosing an ELT approach, which might benefit you since you already have a redshift cluster. It just depends.
4
COMMENT 20d ago
What would a PhD in DE offer compared to a PhD in CS focusing on ML or distributed systems?
2
COMMENT 23d ago
Kum
1
COMMENT 26d ago
120, TaylorMade, IPAs
1k
COMMENT Jun 14 '21
Uppercase makes the DB work harder.
3
COMMENT Jun 10 '21
Best way to start the day.
19
COMMENT Jun 09 '21
Bruh 120 is the new 80
3
COMMENT Jun 08 '21
All you need is DS & A, which will be the key thing GT is looking for. Consider everything else as supplemental.
1
COMMENT May 29 '21
I know that you're right but I hate that booby trapping isn't legal.
5
COMMENT May 28 '21
Since I work in data, I'm going to give a data-specific lessons learned. Fuck pandas.
Preach.
1
COMMENT May 26 '21
Nice! Congratulations! That's a huge bump dude!
2
COMMENT May 16 '21
I'm saving this comment for later lol.
3
COMMENT May 16 '21
As a data engineer, there are so many considerations I can think of...
1) size of data and frequency of execution
2) complexity of transformations
3) source & destination of the data (also network considerations)
4) other sources of complexity (team knowledge of python/pandas vs SQL, testability, managing python environments and dependencies, etc.)
5) experimentation vs production code
99% of the time there is NO reason to use pandas if your source and target DB are the same, so that's a gimme for SQL.
Most of the time you can refactor pandas transformations into SQL, which I will do if I'm productionizing some DS code.
Everything else is a case by case basis based on the above.
I'd suggest every data scientist to dive deep into not just SQL, but how databases work in general to help gauge these tradeoffs.
17
COMMENT May 16 '21
Maybe this is a great opportunity for you to learn how to use your debugger.
12
COMMENT May 16 '21
This was one of the top courses I was looking forward to but I can already tell it's going to be a mess.
Well it's certainly not the course's fault.
2
COMMENT May 12 '21
Why do you need it locally? Writing to some cloud storage (S3 or whatever Azure uses) and reading that from another notebook would be the easiest. From there you could download it locally from there if you really needed to.
1
COMMENT Apr 28 '21
This. Start with the dumb thing first. At the very least, use it as a baseline.
65
COMMENT Apr 27 '21
ExperiencedDevs, does these decisions make it more or less likely that you'll accept offers from companies that pull back from social politics at work?
Much more likely. I'm here to get paid, and nothing more. If I wanted to talk politics I'd live with my in-laws.
1
COMMENT Apr 26 '21
Professor is a moron.
1
COMMENT Apr 24 '21
You could simply query your S3 files using AWS athena. No DB needed for your use case probably.
0
COMMENT 8d ago
Well, you'd be wrong.