I am new to python dataframes so please help me do a merge on pandas dataframe.

df1

custid   custname    email              phone
 x         tina     z.gmail.com        345-345-3456
 y         mina     z1.gmail.com       445-345-3456
 z         zina     z2.gmail.com       555-345-3456
 q         pina                        233-456-3456

df2

custid   custname    email              phone
 x         tina     z.gmail.com        345-345-3456
 y         xina     z1.gmail.com       445-345-3456
 k         tina     tina@gmail.com    703-234-3456
 q         pina     pina@gmail.com    233-456-3456

I want the desired output.

Insert update and delete .Update should happen if the value in not null in df1.If it is null in df1 and have value in df2 then don't update.

df3

custid   custname    email              phone                    Action
 x         tina     z.gmail.com        345-345-3456               None
 y         mina     z1.gmail.com       445-345-3456               Update
 z         zina     z2.gmail.com       555-345-3456               Insert
 k         tina      tina@gmail.com    703-234-3456               Delete
 q         pina     pina@gmail.com     233-456-3456                 None

Recommended Answers

All 4 Replies

Please any help?

Sorry, unfortunately I don't know python so I can't help you. I'm going to upvote your post so hopefully someone comes along who can help.

I am not deeply conversant with this area. Anyhow, I think the merge is one line of code. Psuedo code only so you can rewrite to match your needs.

I think I'd do the merge first before working on the duplicates.

One line? Try:

df1.append(df2)

Try that and see what happens. To clean up duplicates consider:

df1.drop_duplicates(subset=['custname'], keep='last')

Remember that this is just my untested thoughts here and you can tailor as you see fit.

PS. I see there is a merge feature but it's an area I have yet to explore. Try finding tutorials on this area.

df3 = pd.concat([df1, df2])
df3 = df3[df3["email"].notnull()].drop_duplicates(subset = ["email"])

This code produces what you need

commented: Excellent. My thought was to keep df1 as the complete result. Now the OP can pick and choose! +16
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.