Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Json import sorting #4703

Closed
ghego opened this issue Aug 28, 2013 · 22 comments · Fixed by #5287
Closed

Json import sorting #4703

ghego opened this issue Aug 28, 2013 · 22 comments · Fixed by #5287
Labels
Docs IO JSON read_json, to_json, json_normalize
Milestone

Comments

@ghego
Copy link

ghego commented Aug 28, 2013

Noticed that if export a df to json file and then reload the data, ordering is not kept.

import pandas as pd

df.to_json('test.json')
newdf = pd.read_json('test.json')

print df.index

gives:

1
10
100
....

instead of

1
2
3
....

it suffices to call newdf = pd.read_json('test.json').sort() to have the what I'd expect.

Is this intended behaviour or bug?

@jtratner
Copy link
Contributor

The JSON emitted is a dictionary object, not an array, so it's inherently
unordered. It's not possible to maintain ordering (and it's outside of the
JSON spec to do so).

@cpcloud
Copy link
Member

cpcloud commented Aug 29, 2013

no it's sorting lexicographically, most likely because the index is sorted before being converted to a numeric dtype. I think this is a bug.

@jtratner
Copy link
Contributor

@cpcloud sure, that may be, but fundamentally, you can't assure that whatever ordering was there initially should be in the final index. I.e.:

df = DataFrame([range(10)] * 10, index = range(10,0,-1))
df.from_json(df.to_json())

There's no way that you can ensure that the index falls out correctly. But yes, it's clearly lexicographically sorting.

@cpcloud
Copy link
Member

cpcloud commented Aug 29, 2013

hm....okay...but why is it sorting at all?

@Komnomnomnom
Copy link
Contributor

The JSON code initially decodes to a dict with string keys which it then passes to the DataFrame constructor, I think the sorting might be happening after that point:

In [9]: d = {'foo': {'1': 1, '2': 2, '3': 3, '10': 10, '15': 15, '100': 100, '101': 101}}

In [10]: pd.DataFrame(d)
Out[10]: 
     foo
1      1
10    10
100  100
101  101
15    15
2      2
3      3

@cpcloud
Copy link
Member

cpcloud commented Aug 29, 2013

Yep, that'll do it! dicts are sorted in the DataFrame constructor.

@jreback
Copy link
Contributor

jreback commented Aug 29, 2013

passing a list-of-lists will not sort (though may be more expensive 2 create)

@Komnomnomnom
Copy link
Contributor

BTW @ghego if you want to ensure that order is preserved during roudtrip JSONifying try using orient=split

n [27]: pd.read_json(df.to_json(orient='split'), orient='split')
[0]
Out[27]: 
     foo
1      1
2      2
3      3
4     10
10    11
11    20
100  100
101  101

@cpcloud
Copy link
Member

cpcloud commented Aug 29, 2013

Might be nice to have that in the docs/cookbook.

@Komnomnomnom
Copy link
Contributor

@cpcloud agreed. There's a couple of things I'd like to add to the JSON docs (incl a couple of benchmarks and more about the numpy param and when it can be useful). I'll try and get a PR together for this stuff before the next release.

@jtratner
Copy link
Contributor

jtratner commented Sep 8, 2013

So I think this can be closed because it's expected behavior?

@jreback
Copy link
Contributor

jreback commented Sep 8, 2013

I think @Komnomnomnom going to take a look....so let's leave open for a bit

@Komnomnomnom
Copy link
Contributor

Just to clarify I'm going to update the docs to give more detail on the different orients, and make it clear when you can expect non order-preserving behaviour. Apart from that I wasn't going to change anything....

@ghost ghost assigned jtratner Sep 15, 2013
@jreback
Copy link
Contributor

jreback commented Sep 24, 2013

@Komnomnomnom changing this to docs only....when u have a chance

@jreback
Copy link
Contributor

jreback commented Oct 4, 2013

@Komnomnomnom how's docs coming on this?

@jreback
Copy link
Contributor

jreback commented Oct 11, 2013

@Komnomnomnom do we need anything in docs for this? (I believe you did cover it...)...lmk

@Komnomnomnom
Copy link
Contributor

@jreback, sorry yeah still planning to get to the json docs at some point. Hopefully will have some time over the next few days.

@jreback
Copy link
Contributor

jreback commented Oct 11, 2013

gr8

@jreback
Copy link
Contributor

jreback commented Oct 16, 2013

@Komnomnomnom docs?

@Komnomnomnom
Copy link
Contributor

I've done a rough pass but still struggling to find adequate time I'm afraid. Last chance this weekend? :)

@jreback
Copy link
Contributor

jreback commented Oct 16, 2013

docs are ok even after release-candidate....so ok

@Komnomnomnom
Copy link
Contributor

@jreback #5287 eventually got there. Thanks for the reminders.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants